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Abstract .  Statistical  inference  in  Cox's  regression  model  is 
usually  carried  out  using  traditional  (first  order)  large 
sample  theory.  In  the  spirit  of  earlier  success  stories  one 
might  try  to  bootstrap  data  in  order  to  better  assess  the 
sampling  variability  of  the  Cox  estimator.  Such  a  bootstrap 

7  A  •  < 

scheme  is  proposed  in  the  present  paper.  An  asymptotic  justi¬ 
fication  is  given,  showing  that  inference  based  on  the  bootstrap 
procedure  is  first  order  equivalent  to  the  standard  one.  The 
problem  of  constructing  more  accurate  moderate-sample  confidence 
intervals  is  also  addressed,  employing  second  order  fine-tuning 
of  the  bootstrap.  / 
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We  consider  a  Cox  regression  model  of  the  following  form:  X°,...,X°  are  independent 
lifetimes  for  n  individuals.  X?  has  continuous  intensity  (or  hazard  rate) 

6zi 

a^(s)  =  a(s)  e  (1.1) 

where  is  a  covariate  measurement  for  individual  no.  i,  a(.)  is  left  unspecified, 
and  6  is  the  parameter  of  primary  interest.  Observed  is  (X^,  6 ,  i  *  l,...,n, 
where 


X±  *  min{x”,  Ci}, 


I{Xi  -  ci}* 


(1.2) 


c  is  the  "censoring  time"  for  no.  i. 

We  will  assume,  for  simplicity  and  for  ease  of  exposition,  that  z^'s  and 
c^’s  are  non-random  and  that  6  (and  z^)  is  one-dimensional.  Generalisations 
are  possible  in  several  directions,  see  Kalbfleisch  and  Prentice  (1980),  Andersen 
and  Gill  (1982),  Prentice  and  Self  (1982,  1983),  Cox  and  Oakes  (1984),  Andersen 


and  Borgan  (1985).  The  important  p-variate  case  is  treated  in  Section  5. 
Define 


Ni(t)  =  I{X°  <_  t,  X°  <_  c±}, 
Y  (t)  =  l{X°  >  t,  c±  >,  t}. 


(1.3) 


Cox'  partial  likelihood  can  be  written 

n  Y  (s)exp(Sz.)  dNi(s) 

L(6)  =  n  n  (— J - - -  }  ,  (1.4) 

i=l  s>0  E.  Y. (s)exp(Bz .) 

J  =  1  J  J 

cf.  Gill  (1984).  Cox'  estimator  8  is  the  8  value  maximising  L(8)  or  equivalently 

n  »  n  Bz 

log  L(B)  =  E  /  [Bz.  -  log  {  E  Y  (s)  e  -*}]  dN  (s) . 
i-1  o  j=i  J  1 

This  function  may  be  seen  to  be  concave  so  that  8  also  may  be  defined  as  the 


solution  to 


U<8)  -  3 log  L(8)/38 

n  *  oW/  os 

-  Z  /  { z  -  dN.(s)  -  0,  (1.5) 

1-1  0  1  S{0)(s,8)  1 

where 

S(k)(s,8)  =  -  I  (z.)k  Y.(s)exp(Bz  ),  k  -  0,  1,  2.  (1.6) 

n  j=i  J  3  j 

t 

We  shall  also  need  a  nonparametric  estimator  of  A(t)  *  /oa(s)ds,  the 
cumulative  intensity  for  individuals  with  z  =  0.  The  natural  estimator  is 
t  dN(s) 

A(t)  *  /  — - - 7 -  ,  t  >  0, 

0  Yj (s)exp(8Zj) 

cf.  Johansen  (1983),  Andersen  and  Borgan  (1985),  where  N(.)  *  Ej^N^(.).  This 
is  a  step  function  with  jumps  exactly  at  observed  life-lengths,  i.e.  in 

J  =  {t:  AN^(t)  =  1  for  some  i).  (1.7) 

(AB(t)  =  B(t]  =  B(t)  -  B(t-)  for  right  continuous  functions.)  Since  these  jumps 
estimate  conditional  probabilities  we  modify  the  estimator  above  very  slightly 
so  as  to  get  jumps 

AN(s)/n 

AA(s)  *  min  - 7-  ,1},  s  £  J.  (1.8) 

SW(s,B) 

Our  main  concern  in  this  paper  is  assessing  the  variability  of  6,  by 
estimating  its  bias,  its  standard  deviation,  and  by  constructing  confidence 
intervals  for  8  based  on  8.  There  exist  answers  to  these  questions  based  on 
large  sample  theory,  reviewed  in  Section  2,  but  the  approximations  involved  may 
be  coarse.  Recent  success  stories  for  the  bootstrap,  see  for  example  Efron 
(1982a,  1985a,  1985b),  Bickel  and  Freedman  (1981),  Singh  (1981),  Beran  (1982), 
Freedman  and  Peters  (1984),  Abramovitch  and  Singh  (1985),  indicate  that  the 
sampling  variability  of  8  may  be  more  accurately  computed  using  bootstrap 
procedures. 


Such  a  bootstrap  method  is  now  described.  There  are  continuous  c.d.f.s 


F,  F, having  respectively  a,  a,,..., a  of  (1.1)  as  their  intensities;  in 
In  in 

fact 


F(t)  *  1  -  exp{-A(t) } , 

F^(t)  »  1  -  expf-A^t)}, 

where  A,  A, , ....  A  are  the  cumulative  intensities.  We  could  use 
1  n 


(1.9) 


F(t)  *  1  -  exp{-A(t)  } , 

F^(t)  *  1  -  exp{-A(t)exp(£2j) ) 
as  estimators,  but  prefer 


F(t)  -  1  -  n  t]  (1  -  AA(s) } , 


Fi(t)  -  1  -  n  t]  U  -  AA(s) } 


exp(gz^)  (1.10) 


A  is  a  step  function,  corresponding  to  a  probability  distribution  concentrated 
on  the  set  J  of  (1.7),  and  this  distribution  is  exactly  F  above,  cf.  Kalbfleisch 
and  Prentice  (1980,  Ch.  2).  Furthermore,  the  canonical  analogue  of  (1.1)  for 
discrete  distributions  is 

exp(Bz  ) 


1  -  AA^(s)  =  {1  -  AA(s)} 


(1.11) 


cf.  Kalbfleisch  and  Prentice  (op.  cit.,  Ch.  4).  This  leads  to  above. 


Now  generate  independent  realisations 


.o* 


Xt  ~  F  ,  i  =  1,...  ,n 


and  define 

Xi*  =  min{X°*,  Ci},  6  *  =  I{X°*  £  c^.  (1.12) 

*  *  *  *  ** 

(X,  ,5,  ),..., (X  ,6  )  is  our  bootstrap  data.  Calculate  6  as  in  (1.5),  but 

11  n  n 

based  on  the  bootstrap  sample. 

G(t)  -  Pr*(S*  ^  t} 

a  * 

is  the  bootstrap  distribution  of  8  ,  which  in  practice  is  obtained  as 


(1.13) 


>Xtj 


for  a  large  number  BOOT  of  independent  realisations  B  of  8  .  This  requires 
of  course  computing  power  and  time. 

The  bootstrap  idea  is  that  G,  the  distribution  of  B  given  data,  approxi¬ 
mates  G,  the  sampling  distribution  of  8,  or  more  ambitiously  that 


Q(8  ,  B) (data  ~  Q(8,  B) 


(1.15) 


for  any  well-behaved  function  Q. 

Summoning  the  necessary  amount  of  courage  to  follow  this  principle,  we 
can  define 

(A)  The  bias-corrected  estimate  8:  write  E(8  -  B)  “  bias.  Since 
(8  -8) |data  ~  8  -  8 

the  bootstrap  estimate  of  bias  is 


E*(8  -  8) 


.  BOOT 

ioSr*  (l 

b*=l 


Hence 


1  B00T  *h 

6  *  6  +  biaSBOOT  4  BOOT  .  1 \  § 

b*l 

should  have  less  bias  than  6. 


(B)  The  bootstrap  estimate  of  standard  error  is  similarly 

a  =  {Var  (B  )  }”*  A  ( - - —  r 

BOOT  1  )!  SoOT-l 


E  (S  -  B) z) 


a  _  intends  to  estimate  (Var  8)  .  One  may  also  employ  more  robust  versions 
dUUI 

of  a  .  That  this  sometimes  is  necessary  is  explained  in  Rey  (1983)  and 
dUUI 

Parr  (1985),  for  example;  see  also  a  comment  in  Efron's  reply  to  the  discussants 
in  Efron  (1981b) . 

One  may  also  estimate  for  example  E | B  -  8|  and  {E(B  -  3)2}2  in  similar 


(C)  The  bootstrap  percentile  interval: 

G_1(o)  <  &  <  G~1(l-a) 

should  have  coverage  probability  close  to  l-2a.  This  interval  can  further  be 
corrected  in  various  subtle  ways;  this  is  the  topic  of  Section  4. 

Section  2  reviews  some  known  large  sample  theory  for  Cox  regression. 
The  same  methods  of  proof,  essentially  elegant  modern  martingale  techniques, 
are  supplemented  with  brute  force  arguments  in  Section  3  in  order  to  arrive 
at  an  asymptotic  justification  for  the  bootstrap  procedure,  for  example 
>/n  (g  -  g) [data  ~  /n  (8  -  8). 

One  may  also  prove  the  stronger  result 


* 

(B  -  B) 

' 

^  (B  -  B) 

| data  ~ 

A 

{A  (.)  -  A( . ) } 

J 

v^n  {A(.)  -  A( . )  } 

J 

The  going  gets  tougher  in  Section  4  as  we  attempt  to  correct  the  percentile 
interval  f jt  bias  and  acceleration,  which  is  the  bootstrap  way  of  doing  second 
order  asymptotics.  Essentially  Efron's  (1985b)  program  is  followed,  but  since 
the  nuisance  parameter  a(.)  is  infinite-dimensional  some  intermediate  analysis 
in  an  approximating  finite-dimensional  model  is  necessary. 

An  important  ingredient  in  our  construction  of  better  intervals  is  the  so 
called  acceleration  factor.  It  turns  out  that  the  calculation  of  this  factor 
involves  finding  the  skewness  of  a  certain  martingale,  a  technical  problem 
solved  in  the  Appendix,  where  also  other  mathematical  necessities  are  dealt  with. 
Some  potentially  interesting  other  uses  of  the  skewness  formula  are  pointed  out 
in  Section  5,  along  with  some  other  remarks. 


The  results  of  our  efforts  Is  a  confidence  interval  (4.18)  that  should 
be  more  accurate  than  the  standard  one,  but  which  requires  much  more  computation. 
The  bootstrap  sample  size  BOOT  in  (1.13),  (1.14)  should  be  at  least  1000. 
Importance  sampling  and  possibly  smoothing  tricks  to  estimate  the  necessary 
quantiles  of  G  can  perhaps  be  used  to  speed  up  convergence. 


Remarks . 


AF^t)  - 
AAi(t)  » 


(1)  is  simulated  from  F^,  which  has  probability  masses 


exp(|3z  )  ^ 

H[0  t)  {1  -  AA(s) }  1  AAi(t), 


1  -  (1  -  AA(t) ) 


exp($z  ) 


(1.16) 


for  t  e  J  of  (1.7).  We  could  also  have  used  defined  after  (1.9),  and  the 
results  would  be  the  same  in  an  asymptotic  sense.  However,  looking  at  AF^(t) 
reveals  that  the  difference  could  be  appreciable  for  small  and  moderate  sample 
sizes,  i.e.  exactly  in  situations  where  the  bootstrap  is  called  for  to  fine- 
tune  large  sample  approximations.  Notice  that  F^(t)  >  F^(t)  always. 

The  choice  of  F^,  as  opposed  to  Fi  or  for  example  smoothed  versions,  is 
faithful  to  the  original  bootstrap  spirit  in  that  it  reduces  to  the  usual 
Kaplan-Meier  estimate  when  8  =  0  (or  when  every  z^  «  0)  (and  A  is  reduced  to 
the  usual  Nelson-Aalen  estimator) ,  and  is  in  this  case  also  equal  to  the 
ordinary  empirical  distribution  function  when  no  censoring  is  present. 


(2)  The  bootstrap  scheme  proposed  above  is  particularly  suited  to  the 
case  where  the  censoring  times  c^  are  known.  (If  there  is  no  censoring  at 
all  then  each  c^  =  <=.)  It  utilises  the  information  about  c^  and  z for  individual 
no.  i  effectively.  A  typical  example  of  this  "fixed  censorship"  situation  is 
one  where  individuals  enter  the  experiment  at  different  times,  but  where  there 
is  a  fixed  "data  collection  day",  as  was  the  case  with  the  data  Efron  (1981a) 


considers. 


In  other  situations  the  random  censorship  model  could  be  more  appropriate. 


Then  c^,  c^,...  are  considered  as  being  independent  of  X^'s  and  with  a  common 

c.d.f.  R.  Our  bootstrap  procedure  should  then  be  modified  as  follows:  If  6^  *  0, 

* 

so  that  is  actually  observed,  use  c^  =  c^.  If  <5^  =  1,  i.e.  c^  is  not 

* 

observed,  generate  a  likely  outcome:  c^  ~  R,  the  Kaplan-Meier  curve  for  R. 

Or,  slightly  more  fancy:  If  6  =  1  then  one  knows  that  the  unobserved  c ^  is 

*  «  . 

to  the  right  of  X^,  cf.  (1.2),  so  one  should  perhaps  generate  c  from  R/R(X^,°°) 
In  any  case  one  uses 

*  ,  o*  *,  *  ,  o*  *, 

Xt  -  min{X  ,  c  },  5  =  I{X  £  c  } 

instead  of  (1.12)  . 

ic  ic  a 

A  related  version  could  generate  all  c^  ,...,cn  independently  from  R; 
this  would  however  not  utilise  the  available  information  about  the  observed 


The  resampling  schemes  discussed  above  simplify  to  those  used  by  Efron 
(1981a)  in  the  case  of  no  covariates  (or  all  covariates  equal). 


(3)  If  z ^  is  dichotomous  (treatment/control,  say)  the  bootstrap  schemes 

simplify  to  close  relatives  of  one  used  in  Efron  and  Gong  (1982,  Section  7). 

More  generally,  if  the  z.'s  take  on  K  different  values  z ,z  ,v*  correspon- 

l  (.1;  W 

ding  to  K  treatments  or  groups,  then  the  bootstrap  procedure  amounts  to 

o*, 

generating  X^  s  in  groups,  say 

Xk,j  ~p00<t>  -  1  -  ”[0,tJ  11  -  “(«)»  1  '  .  j  -  1 . % 

where  n^  is  the  number  of  individuals  having  z  =  z^^*  k  =  1,...,K. 

O* 

Observe  that  the  X^  .  ’s  above  are  not  resampled  from  the  original 
observations ,  even  when  no  censoring  is  present.  Rather,  they  are  generated 
from  a  better  and  model-based  estimate  of  their  true  F^^  than  is  for 


example  the  usual  empirical  distribution  for  observations  from  group  k. 


(4)  An  important  and  related  comment  is  that  the  Cox  structure  (1.1) 
is  explicitly  relied  on  in  the  sense  that  if  the  model  is  wrong,  then  6  may 
display  a  different  sampling  variability.  Thus  the  second  order  correct 
confidence  intervals  for  8  we  construct  in  Section  4  are  not  necessarily 
even  first  order  correct  when  the  model  is  wrong  (then  the  meaning  of  the 
"true"  parameter  8  must  be  changed  to  the  "least  false"  parameter;  see  Hjort 
(1985)).  Similar  remarks  apply  to  the  confidence  intervals  constructed  under 
model  assumptions  in  Efron  (1985a,  1985b).  More  robust  confidence  intervals 
(for  the  least  false  parameter)  would  possibly  be  the  result  if  one  resampled 
the  triplets  (X^,  6^,  z  J  directly,  see  the  arguments  in  Efron  (1982a,  Section 
5) .  This  scheme  is  used  in  examples  in  Tibshirani  (1984)  and  in  Efron  and 
Tibshirani  (1985). 

Resampling  the  triplets  does  not  utilise  the  fine  structure  of  the  Cox 
model,  and  it  is  felt  that  the  bootstrap  scheme  proposed  here,  based  on  the 
best  available  estimated  model,  is  better  suited  to  catching  the  finer  aspects 
of  the  sampling  variability  of  8.  It  is  still  possible,  however,  that  the  two 
schemes  are  first  order  asymptotic  equivalent,  and  that  a  second  order  correction 
to  the  simple  method  can  match  the  second  order  correction  we  present  in 
Section  4  for  the  model-utilising  method. 

It  appears  important  to  sort  out  the  consequences  (at  least  asymptotically) 
of  using  the  simple  method  versus  the  model-based  method,  when  the  model 
is  correct,  and  for  specific  departures  from  the  model.  This  is  not  done  here. 


2.  Asymptotic  theory  for  Cox  regression. 


Introduce 


M  (t)  =  N  (t)  -  /  Y  (s)exp(6  z.)  dA(s) ,  i  =  l,...,n 
i  i  o  i  o  1 


(2.1) 


where  8o  denotes  the  true  value  of  8.  These  are  square  integrable,  orthogonal 


martingales  with  variance  processes 

t 

(M  ,  M  )(t)  =  /  Y . (s)exp(S  z  )  dA(s). 

11  0  1  ol 


(2.2) 


The  family  of  a-algebras  implicitly  referred  to  is 
Ft  -  a{Ni(s),  Y ^( s)  ;  s  <_  t},  t  _>  0. 

Assume  that  the  individuals  are  observed  over  a  time  interval  [0,  T] . 

Then  from  (1.5) 

n  T  S(1)(s  6) 

U(B)  -  I  /  { z,  ~  il}  dM.(s) 

i-1  0  1  S(°  (s , 8)  1 


n  T  S(1)(s  6) 

+  I  /  (z - t-jt v — Y  (s)exp(B  z.)  dA(s). 

i-1  0  1  S(0  (s,B)  1 


(2.3) 


The  second  term  vanishes  when  6=8  ,  i.e.  U(B  )  =  U(8  ,  T)  is  a  martingale,  with 

o  o  o 

.  .  n  T  S(1) (s,B  )  2 

-  (U(B  ),  U(6  ))  (T)  =  -  I  /  { z.  -  rprr - — }  d(M  M  )(s) 

n  o  o  n  1=1  0  i  S  0) (s,B  )  1  1 

o 

T  (2)  S(1) (s,8  ) 2 

=  /  {SU;(s,B  )  -  7-prr - —  }dA(s), 

0  °  S<0  (s,8  ) 

o 


using  (1,6).  Assuming 

S(k)(s,8)  s(k)(s,B),  0  <  s  <  T,  k  =  0,  1,  2 


(2.4) 


uniformly  in  s  and  in  a  neighbourhood  of  6  ,  in  probability,  it  follows  that 
i,  d 

n  2  U(8  )  -*■  N(0,  I)  ,  (2.5) 


where 


T  m  s(1  (s,  8  ) 2 

2  =  a2  =  /  (s'"  ;(s,8  )  -  -T7TT - —  }  dA(s) , 

0  0  s(0)(s,B  ) 

o 


(2.6) 
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To  employ  a  Taylor  argument  we  need 
1(8)  *  a2log  L(8)/362  -  3U(8)/38 

n  T  q(2)  /  /.  n\  2 

--.If  -  {  /Q\(-  >B)>  1  dN  (s). 

i-1  0  S(0)(s,B)  S(0)(s,B)  1 

Using  (2.1)  once  more  we  get 

-  i  1(B)  =  £  E  f  [S/QN(  ^  -  {■"/QN  >  ]  dM  (s) 

n  n  i-1  0  S  0  (s , 8)  S(0)(S,B)  1 

*  '  S<0>(S,6)dA(s). 

»  S(0)(s,B)  S(0)(S,B) 

It  can  be  shown,  using  Lenglart's  inequality,  that  if  6  1  8  ,  then  the  first 


term  goes  to  zero  in  probability,  and 

1  ~  P 

-  ±  K6)  *  E. 


(2.7) 


This  implies,  using 


0  «*  U(8)  -  U(B  )  +  1(6)  (6  -  6  ) 
o  o 

'V  A 

for  some  8  between  8  and  8,  that 

o 

&  (6  -  6  )  -  {-  -  1(6)  T1  n~h  U(6  ) 
on  o 

d 

-*■  I  N(0,Z)  =  N(0,  E  ), 


(2.8) 


since  B  may  be  shown  to  consistent. 

All  this  is  in  Andersen  and  Gill  (1982)  along  with  the  necessary  regularity 
conditions.  That  paper  also  establishes  the  consistency  of  the  natural  estimator 


£  ■  o2  -  /  {S<2)U,B)  -  JAW 

0  S  0) (s , B) 


(2.9) 


for  Z,  and  explores  the  simultaneous  convergence  in  distribution  of  •'n  (6  -  6  ), 


3.  Asymptotic  justification  for  the  bootstrap. 


The  aim  of  the  present  section  is  to  show  that  inference  based  on  the  bootstrap 
method  of  Section  1  is  asymptotically  equivalent  to  the  traditional  inference 
that  is  now  carried  out  each  week  on  the  basis  of  large  sample  results  like 
those  of  Section  2.  With  such  a  result  in  the  rear  one  could  try  to  go  further, 
exploring  the  possible  superiority  of  the  bootstrap  to  the  traditional  analysis 
in  situations  with  smaller  samples,  by  extensive  simulation  studies,  as  for 
example  in  Freedman  and  Peters  (1984),  or  by  theoretical  investigations,  for 
example  along  the  lines  of  Beran  (1982,  1984),  Abramovitch  and  Singh  (1985), 

Efron  (1985a,  1985b).  Some  speculations  of  this  sort  are  offered  in  Section  4. 

With  (1.15)  representing  the  basic  bootstrap  idea  we  hope  to  compare  the 

~  a  *  * 

distribution  of  8  -  8q  with  that  of  8  -  8  given  data.  We  limit  the  discussion 
to  sequences  of  outcomes  where  the  Cox  estimator  8  converges  to  the  true  8q.  This 
event,  call  it  has  probability  1  under  the  regularity  conditions  of  Tsiatis 
(1981).  It  is  not  yet  clear  whether  strong  consistency  of  8  can  be  established 
under  weaker  conditions  of  the  type  made  by  Andersen  and  Gill  (1982);  their 
martingale  techniques  yield  only  convergence  in  probability.  We  will  henceforth 
assume  sufficient  regularity  to  ensure 

Pr(Q  )  =  Pr{ 8  -*•  8  }  =  1. 
o  o 

(Of  course  every  random  element,  including  covariate  processes  and  censoring 
mechanisms,  must  then  be  defined  on  a  proper  common  probability  space.  Without 
the  strong  consistency  assumption  the  results  below  must  be  rephrased  in  a  more 
cumbersome  manner,  and  become  of  "in  probability"  type.) 

We  will  avoid  being  too  general  here,  and  also  avoid  putting  up  all  the 
needed  regularity  conditions;  these  are  rather  given  implicitly  by  the  arguments 
offered  below.  The  i.i.d.  like  framework  of  Tsiatis  (1981)  will  be  sufficient; 


it  Is  suspected  that  also  Andersen  and  Gill’s  conditions  will  make  the  arguments 
work. 


The  bootstrap  sample  is  (X^  ,  6^  ),  i  *  l,...,n  as  in  (1.12).  Define 

Ni*(t)  -  l(Xi*  <  t,  6*  =  1}, 

Y  *(t)  =  I(X°*  >  t,  c  >  t}, 
t 


M  *(t)  -  N  *(t)  -  /  Y  *(s)  dA.(s) 
i  i  0  i  i 


(3.1) 


-  V‘>  -  Ero,c] 


where  A^  is  given  by  (1.8)  and  (1.11).  The  's  become  orthogonal  martingales 
w.r.t. 


r(N±  (s),  Yi  (s)  ;  s  <_  t},  t  >_  0, 


with  variance  processes 
-k  k 


(Mx  ,  H±  )(t)  =  Z[0  t]  Yi  (s)AA1(s)(l  -  AA1(s)},  (3.2) 

* 

cf.  Gill  (1980)  or  Helland  (1982).  Furthermore  Yi  (.)  is  predictable  (or  pre- 


visible),  i.e.  Y^  (t)  is  known  at  time  t-. 


8  was  defined  as  the  solution  to 
n  T  „(1)* 


U*(B)  =  l  }  { z  -  S/0L-S>— >  dN  *(s)  =  0, 
i-1  0  1  S(0)  (s,8)  1 


(3.3) 


where 


(V\k  1  k  * 

’  (s, 8)  =-  I  (z.)  Y.  (s)exp(8z.) ,  k  =  0,  1,  2. 
n  j=l  J  j  j 


(3.4) 


We  also  need 


I*(8)  =  3U*(8)/38 


n  T  _(2) 


(1)’ 


(3.5) 


Proposition.  With  probability  one, 
vn  (8*  -  8) | data  -  N(0,  E-1)  , 


^ i  1  *  “ 

and  both  E  *  -  I  (8  )  and  E  -  -  I  (8)  converge  to  E  in  probability. 

n  n 

Indication  of  proof:  The  Taylor  argument  leading  to  (2.8)  can  be  repeated 
to  give 

-*  i  *  *  .. 

1  T  /0  "  -  2  ”  /oS  (3.6) 


(8  -  8)  *  (-  i  I  (6  )}  n_  i  U  (8), 

-  -  * 

where  6  is  between  8  and  8  .  We  must  prove  that,  for  sequences  in  (i) 

n~''2  U*(g)  t  N(0,  E),  (ii)  -  -  I*(8)  S.  i  whenever  8  £  8  ,  and  (iii)  B*  -►  8  • 

n  o  o 


Basic  to  these  results  is  the  convergence 

.00 


Pr**0<s<T  BeS  ^ S ~  -  E  ldata*  °»  a.s.»  every  e  >  0 

(3.7) 

i.e.  Sv  (s,8)  ought  to  converge  to  the  same  s'  '(s,B)  as  did  Sv  ;(s,B)  in  (2.4) 
uniformly  in  s  and  in  a  neighbourhood  Sq  of  8Q»  in  probability,  given  data,  for 
outcomes  in  £2q.  This  can  be  established  using  a  (weak)  law  of  large  numbers  for 
the  space  of  right  continuous  functions  with  left  hand  limits  on  [0,  T]  to  a 
separable  Banach  space,  available  from  the  proof  of  such  a  (strong)  law  given  by 
Andersen  and  Gill  (1982,  Appendix  II). 

Start  out  writing 


*  n  T  , 

U  (6)  *  L  f  {*. 

0  i 


i=l 


<<■> 

Sl ’  (s ,  8) 


n  T 


<,(!)*,.  ^  * 

+  S  /  iz.  -  VnV*  }  Y<  (s>dVs) 
i-1  0  1  s(0)  (s,8)  1  1 

*  u*(8)  +  U*(8) 

★  ^ 

as  in  (2.3).  Now  U^B)  does  not  vanish  for  8=8,  but  one  may  prove  that 
-U  *  -  P 

n  U2(B)  0,  a. s. , 


by  writing 


K 
t  • 

fc! 

P 

►. ' 

v;- 

>; 

c 


i 


'i'v  vr|yrB  v  '*j." 


-■  -j  *  J_W  i.n, W  '.^ 


AA^s)  -  1  -  11 


expCdz^) 


AA  (  s  )  * 

*  AA(s)exp( 8z . )  -  0  lAA(s)2).  (3.8) 

1  P 

(One  could  for  simplicity  assume  the  covariates  to  be  bounded.)  AA(s)  of  (1.8) 

1  *  - 
is  of  order  0  (— ) .  Next  look  at  n  *  U, (3).  U, (B)  •  U.(8,  T)  is  a  martingale 
p  n  ill 

(k)  * 

(the  processes  Sv  (s,6)  are  predictable  in  this  conditional  bootstrap  framework), 
and 


*■  •>  V 


1*-  *  -  lnTf  s^*(s  6). 2  * 

-  (U.(B),  U.(8))(T)  *  -  I  /( z,  -- - 

n  i  i 


=  /  (S(2)*(s,6) 

o 


i=l  u 
S(1)*(s,8)2 
S(0) *(s, 6) 


.  -  ?Ym*  }  Y<  (s)AA. (s){l  -  AA  (s)} 
1  5  *(s,B)  1  1  1 


}  dA(s), 


where  =  means  ignoring  0  (AA(s)2}  terms.  That 
-U  *  -  d  P 

n  2  U.(8)  -*•  N(0,  I),  a.s. 


follows  now  by  Rebolledo's  central  limit  theorem  for  martingales,  cf.  Andersen  and 
Gill  (1982)  or  Helland  (1982),  using  (3.7),  Pr(fiQ)  =  1,  and  a  necessary  lemma 
which  states  that 


/  H  (s)  dA(s)  -»  /  h(s)  dA(s) 

0  n  0 


(3.9) 


if  H  is  predictable  and  converges  to  h  in  probability.  (3.9)  is  proved  in  the  Appendix, 
n 

Now  (i)  stated  in  the  beginning  of  the  proof  is  demonstrated,  (ii)  and  (iii) 
may  be  arrived  at  by  similar  efforts,  following  the  route  offered  us  by  Andersen 
and  Gill  (1982)  and  repeatedly  using  arguments  involving  (3.8)  and  (3.9)  when 
encountering  new  difficulties,  and  Lenglart's  inequality.  We  will  refrain  from 
giving  all  the  details  here. 


L_1 
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4.  Confidence  Intervals. 

The  standard  confidence  interval  for  8  is  based  on  the  large  sample  result 
•^n  (6  -  8) /t  -*■  N(0,  1),  a  consequence  of  (2.8)  and  (2.9),  writing  t  =  l/o, 

t  *  l/o.  In  fact, 

PrQ{>Zi  (8  -  8)/t  <_  t)  =  4>(t)  +  0(n  ) 

P 

under  mild  extra  conditions,  for  example  boundedness  of  the  covariates.  Thus 


*  < 

i&i 
*  •  *  •  -  ] 


Pr  (8  -  z^-ct^T/Zn  _<  8}  =  1  -  a  +  0(n  "*) , 

8  , 

Pr.(8  +  z(1  a^r/Zn  _<  8)  =  a  +  0(n  J) , 

8 

in  particular  the  standard  interval 

8  -  z^'^x/Zn  1  8  1  8  +  z(1_a)T/^ 

-4 

has  coverage  probability  1  -  2a  +  0(n  ) .  Here 

2(a)  =  -  z(1-a)  =  $-1(a) ,  z(1“°°  -  rV-a). 


(4.1) 


(4.2) 


(4.3) 


These  confidence  intervals  (and  similar  ones  for  one  out  of  several  relative 
risk  parameters  )  are  widely  used  in  biostatistics  and  the  engineering 

sciences  and  can  even  make  it  to  The  New  York  Times. 

The  present  section  is  concerned  with  the  possibility  of  constructing 
confidence  intervals  for  8  with  better  moderate-sampling  properties  than  the 
standard  one. 

Before  we  embark  on  that  journey,  let  us  briefly  comment  on  the  order  of 

magnitude  argument  that  led  to  (4.1).  Results  of  A. I  in  the  Appendix  can  be 

shown  to  imply  that  (a2  -  a2)  converges  to  some  normal  limit  in  distribution. 

It  follows  therefore  from  (2.8)  that  (6  -  6q)  =  U/o2  +  0p(n  )}  n  U(6o) , 

and  the  martingale  n~'2  U(8q)  behaves  in  the  correct  way:  It  converges  to  a 

normal  (0,  a2) ,  and  one  may  show  using  techniques  of  the  Appendix  that  it  has 

skewness  y.  /Zn  where  y.  tends  to  some  y.,  and  kurtosis  y 2Jn  where  y 
1 ,  n  x ,  n 

tends  to  some  y^-  These  facts  surely  indicate  that  the  speed  is  the  usual  Zn 
towards  normality,  i.e.  (4.1)  is  true.  A  more  careful  analysis,  that  perhaps 


!w4| 
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r*  *  a 

also  could  lead  to  a  Berry-Essden  theorem  for  /n  (B  -  6  )/t,  could  start  out 

■  o 

-u  -U 

rewriting  n  U(B  )  as  A  -  n  B  ,  where 
on  n 


A  *  —  Z  f  {z  -  e(s,6  )}  dM  (s), 
n  G  i-1  0  1  °  1 


B  -  —  l  f  Z  (s)  dM  (s) , 

n  i-1  o  n  1 

Z  (s)  *  i/n  {E(s, 8  )  -  e(s,B  )}, 
n  oo 

and  where  we  for  convenience  have  used  E(s,B)  =  (s,B)/S^  (s,8)  and  e(s,B) 

for  its  limit  in  probability  s^  (s  ,B)  /s^  (s,8) .  The  point  of  the  rewriting 

is  that  has  independent  summands,  whereas  U(Bq)  has  dependent  summands. 

If  the  covariates  are  bounded,  then  the  n  distributions  governing  the  n 

martingales  M^,...,M^  are  close  to  each  other,  and  one  can  write  down  an 

Edgeworth-Cramer  expansion  or  even  a  Berry-Esseen  theorem  for  A  .  One  may  further 
'  n 

show,  using  martingale  techniques  as  in  Section  2,  that  (A^,  B^)  tends  in 
distribution  to  (A,  B) ,  say, where  A  and  B  are  independent  and  Gaussian. 

The  brief  discussion  above  can  be  made  rigorous  in  the  sense  of  securing 
(A.l),  and  can  possibly  also  be  used  to  establish  (at  least  the  existence  of) 
an  Edgeworth-Cramfir  expansion  for  /n  (B  -  Bq) . 


A.l.  Percentile  and  bias  corrected  bootstrap  intervals. 

-  -*  d 

The  results  of  Section  3  imply  /n  (6  -  B)/t  ■+  N(0,  1)  a.s.,  where  (t  ) ^ 

-*-l 

Z  *  l/o  .  This  will  imply  that  the  bootstrap  percentile  interval 


G-1(a)  _<  B  <  G_1(l-a) 

is  first  order  asymptotically  equivalent  to  the  standard  interval  (4.2) 


(A. A) 


Let  us  in  what  follows  suppose  that  sufficient  regularity  is  in  force 


to  ensure 
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:>>> 


■N 

% 


Vr.i'/n  (8  -  8)/t  <  t}  *  <t>( t )  +  0(n  5)  a.s.  , 


(4.5) 


- *  «  **— 1 

writing  (t  )*■  •»  Z  ,  where  the  a.s.  in  the  statement  reminds  us  that  the  bootstrap 

probabilities  PrA  are  conditioned  on  the  random  data.  Uniform  boundedness  of  the 

covariates  is  sufficient  for  this  Berry-Esseen  type  statement  to  be  true.  (The 
arguments  below  are  basically  valid  even  without  (4.5)  and  (4.1),  but  results  must 
then  be  rephrased  in  a  cumbersome  manner  less  suited  for  exposition.)  Then 
Pr^{8  _<  8  +  tx  /*/n)  =  G(8  +  tx  /*^n)  *  $(t)  +  0(n  2)  a.s., 
and  one  can  show  from  this  that 

G_1(l  -  a)  »  6  +  z(1"a);*/^  +  0(n_1)  a.s.  , 

G_1(a)  =  8  -  +  0(n_1)  a.s. 

Hence  (4.4)  is  first  order  equivalent  to  the  standard  interval  (4.1),  in  particular 


PrJG  1(a)  <  8  <  G-1(l  -  a)}  =  1  -  2a  +  Oin'S  a.s. 

P 


(4.6) 


There  are  ways  to  incorporate  slight  corrections  to  the  percentile  interval, 
the  simplest  of  which  is  Efron's  bias-corrected  intervals,  see  Efron  (1981b,  1982a, 
1985a) .  Suppose  that  a  smooth  increasing  transformation  h  exists  with  the  property 


^  (h(B)  -  h(8) }  ~  N(-bx,  X2), 
^  (h(8*)  -  h(8) }  ~  N(-bA,  X2), 


(4.7) 

(4.8) 


where  ~  and  ~  mean  "approximately  distributed  as",  b  could  be  zero,  in  fact 
Sections  2  and  3  in  a  sense  show  that  b  will  be  zero,  asymptotically,  to  the 


first  order.  We  may  think  of  b  above  as  a  second  order  fine-tuning  of  this, 

say  b  =  b  /  Jn. 
o 

The  normal  pivotal  assumptions  (4.7),  (4.8)  give  a  bias-corrected 
percentile  interval,  as  in  Efron  (1982a, Ch.  10).  The  result  is 


G“1{4’(z(a)+  2b)}  <  8  <  G'1{<fi(z(1_0l)+  2b)} 


(4.9) 
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—  A  A  -A 

where  b  *  $  (G(0)}  compensates  for  the  possible  bias  of  the  observed  0  as  an 

estimate  of  0.  If  PrA{0  8}  ■  Pr{0  0}  *  h  then  b  »  0  and  (4.9)  reduces  to 
the  ordinary  percentile  interval. 

It  is  not  obvious  that  b  can  be  taken  to  be  the  same  in  (4.7),  (4.8);  this 
is  rather  an  optimistic  (but  educated)  guess  trusting  the  bootstrap  approximation 
idea.  There  are  also  theoretical  reasons  for  believing  in  a  common  b  * 
however.  Further  comments  on  this  and  the  question  of  how  precise  the  ~,  ~ 

statements  in  (4.7),  (4.8)  must  be  are  offered  in  the  next  subsection. 


4.2.  Second  order  correct  intervals. 

A  competing  interval 

“low  i  6  i  “up 

to  (4.2)  and  (4.4)  would  be  more  accurate  if 

Pr0{^LOW  -  '  1  -  a  +  0(n  1),  Pr^B^,  1  0}  *  a  +  0(n  X)  . 


(4.10) 


Perhaps  many  intervals  share  this  property;  we  are  only  interested  in  those 

that  are  "inf erent tally  correct",  which  we  take  to  mean  "based  on  the  Cox 

estimator  0".  (0  is  an  asymptotically  optimal  estimator  according  to  Begun  et 

al.  (1983)  (but  shares  this  property  also  with  for  example  Bayes  estimators, 

see  Hjort  (1985).)  We  should  therefore  look  for  intervals  of  the  type  0LQW  = 

0  -  a\/^n  +  A  ^°Vn,  0  =  0  +  t/’/n  +  A  ^  aVn  with  the  A  's  chosen 

n  UP  n  n 

to  make  (4.10)  true.  Such  intervals  could  be  termed  second  order  correct. 

It  is  clear  that  second  order  asymptotics  in  some  form  or  another  must 
enter  the  discussion  if  such  sharper  intervals  are  to  be  constructed.  The  approach 
described  now  is  due  to  Efron  (1985b).  The  following  may  be  read  as  essentially 
a  review  of  his  method,  but  with  more  attention  paid  to  order  of  magnitude  argu¬ 
ments  and  to  the  assumptions  actually  used  to  make  the  second  order  statements  true 
The  results  of  Section  2  imply  via  the  delta  method  the  more  general  first 


order  asymptotic  result 

Pr0[i^n  { g ( 8)  -  g(0) }/ 1(0)  <  t]  =  $(t)  +  0(n  ^) 


for  each  smooth  transformation  g,  with  Mg)  *  [g'(0)|t.  This  is  not  sharp 

enough  to  get  to  (4.10).  But  assume  that  for  some  smooth  increasing  g  and  some 

cleverly  chosen  constants  a  ,  b 
J  o  o 

•/n  { g ( 6)  -  g(B)>/X(B)  ~  N(-bQ/ /n,  1),  X(g)  =  X(Bo)  +  ao{g(B)  -  g(BQ)} 
in  the  sharper  second  order  sense,  i.e. 

Pr  [i^n  (g(8)  -  g(B)}/X(B)  <  t]  =  $(b  />/n  +  t)  +  0(n  1) , 
p  —  o 

where  X(g^)  is  the  standard  deviation  for  the  limiting  distribution  of  >^n  (g(B) 

-  g(B))  under  some  reference  parameter  value  Bq»  In  order  to  improve  chances 
of  there  being  such  a  sharper  g  both  a  bias-correction  in  the  form  of  bQ/i/n  and 
an  acceleration-correction  entering  the  standard  deviation  have  been  allowed. 

The  limiting  standard  deviation  of  *^n  {g(B)  -  g(B)}  under  B  is  of  course  not 
always  of  the  form  X(Bq)  +  aQ{g(B)  -  g(8Q)},  we  would  in  our  situation  expect 
it  to  depend  also  upon  the  unknown  hazard  function  ct(.)  for  example.  However, 
the  statement  needs  only  be  local  in  character,  i.e.  valid  for  B  in  a  neighbourhood 
of  the  reference  point  8q,  and  the  dependence  upon  a(.)  and  possibly  covariate 
values  may  be  subsumed  in  the  (approximate)  constant  a^. 

There  is  no  harm  in  taking  g(BQ)  =  0  and  X(gQ)  *  1  since  this  can  be  arranged 
by  adjusting  g  accordingly.  We  therefore  state  the  assumption  above  as 

Hn(t)  =  VxB{—  <  t}  =  $(bQ/^  +  t)  +  0(n-1),  (4.11) 

writing  y  =  g(8),  Y  =  g ( B) -  (The  0-term  depends  upon  both  B  and  t;  the  extent  to 
which  uniformity  is  needed  is  discussed  later.)  Notice  the  similarity  to  Edge- 
worth  expansions;  the  r.h.s.  may  be  written  $(t)  +  4> ( t ) 6q/ */ri  +  0(n  ) . 

Under  assumption  (4.11)  a  second  order  correct  confidence  interval  for  B, 
involving  g,  a  ,  b  ,  may  be  constructed.  One  has 


Y  -  Y  -  -  (1  +  a  y)  (Z  -  b  /AT) 

AT  °  0 


where  Z  is  very  close  to  being  standard  normal. 


Pra{Z  -  b  /A 1  <  t}  =  <t(t  +  b  /A i)  +  0(n  ^)  , 
p  o  ~  o 

Pr  (Z  <  t)  *  $(t)  +  0(n  . 

D 


(4.12) 


It  follows  that 


d  a  b 

1  +  a  Y  -  (1  +  a  y)  (1  +  —  (Z  -  —  )}. 
o  o  r  r 

/n 


It  should  be  safe  to  take  logarithms  since  both  Y  and  (Z  -  b  //a)/*^n  are  close 
to  zero;  see  Efron  (1982b,  Sections  4  and  8)  for  a  more  careful  treatment,  and 


the  corresponding  discussion  in  Efron  (1985b).  Hence 

„  d  a  b 

h(Y)  =  b(y)  +  W,  W  =  log  {1  +  —  (Z  -  —  )} 


(4.13) 


where  h(t)  *  log  (1  +  aQt).  The  natural  interval  on  this  h{g(8)}  scale  becomes 
h(y)  -  w(1-a)  <:  h(y)  <  h(Y)  - 

where  Pr  {W  <  w^  }  *  a,  Pr  {W  <  °^  }  =  1  -  a.  Some  analysis  shows  that 

P  P 


w(o°  =  log  U  +  -a  (2(a)  --£)}+  0(n_3/2) , 

& 


and  similarly  for  w^  a\  Using  h  ^(t)  *  (1/a^)  (e*”  -  1)  and  some  algebra 

Yyp  =  h  1{h(y)  -  w^} 

=  ^  +  0(n-3^2) 


Whete  .  ,  .  z(1'a)  +  bo/^ 

YUP  "  Y  +  /n  (1  +  3°Y)  1  -  (a  //n)  +  b  /^} 

o  o 

~  -3/2 

and  similarly  for  YT„„  =  YTriI,  +  0(n  ). 


E 


Our  "correct"  Interval  is  now 


8  <  ft  <  8  ; 

LOW  -  -  UP  ’ 


6low  =  g  (ylow^’  6up  ”  8  (yup^* 


One  has  by  (4.13) 


Pr6{BUP  1  B>  -  Pr6{Yup  1  Y) 


Pr6{h{Yup  -  0(n  3/^2)  }  <_  h(Y)  > 


Prg{h(Y)  -  v(a)  +  0(n_3/2)  <  h(Y) } 


PrJW  <  w(a)  +  0(n~3/2)} 

P 


=  PrQ[Z  -  b  /Sn  <  {✓n/a  }{exp(w^)  -  1}  +  0(n  1)] 
p  o  o 

-  *[b  //K  +  { /n/a  }{exp(w(a))  -  1}  +  0(n-1)]  +  0(n-1) 
o  o 

=  ${z(a)  +  0(n-1)}  +  0(n-1)  =  a  +  0(n-1) , 


and  similarly 


-U 


Pr6{6LOW-  6>  -  1  -  a  +  0(n  ). 


The  Interval  (4.14)  depends  upon  the  unknown  quantities  g,  a^,  b^. 


approximations  to  8^QW  and  8^  must  be  devised.  (4.14)  was  derived  under 


assumption  (4.11);  its  bootstrap  version  is 


Hn  (t)  =  Pr* 


. —  •**  a 

{■  n . ^ Y — ~~~  ^  t}  =  <t(b  I 'fn  +  t)  +  0(n  ^)  a.s.,  (t- 


1  +  a  Y 

o 


-* 


writing  y  =  g ( B  ),  Y  =  g(8).  Assuming  (4.15)  to  be  true  and  remembering 


G(B)  =  Pr*{6  1  8) 

*  .7-  -1 

=  H  (0)  =  <t(b  //n)  +  0(n  )  a.s.. 


-* 


G(6Up)  =  >  i  8<6up)} 


^  .  z(1"a)  +  b  /& 

Pr*[Y  -  Y  1  -  (1  +  a  Y)  ° 

✓n  0  1  -  (a  //n)  iz - + 


0  1  -  (a  / /n)  (z^  +  b  />/n} 


z(1'Ct)  +  b 

o 


★ 

Hn  ' 


1  -  (a  / /n)  {z^3  +  b  //n} 

o  o 


-  4>(z[1-a])  +  0(n_1)  a.s.  , 
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.14) 


and 

*.15) 

(1.13), 


*  zW  +  b  /*£ 

G(B.nu)  -  H  [ - A - ] 

L0W  n  1  -  (a  /X)  {z(a)  +  b  /&} 
o  o 

«  *(z^)  +  0(n  *)  a.s.. 


where 


[1-a] 


:(1-a)  +  b  /& 


Sn  1  -  (a  /t^n)  (z^  +  b  /i/n) 


(4.16) 


and  analogously  for  zL  .  Hence 


b  /*£  -  <t'“1{G(6)  +  0(n_i)  }  =  <t-i{G(B)  }  +  0(n-i)  a.s., 
o 

6UP  *  G-1{$(z[1-a])  +  0(n_1)}  -  G^C-Kz ) }  +  0(n_3/2)  a.s.,  •  (4.17) 

6LOW  =  G_1{<t>(z[a])  +  0(n_1) }  =  G_;L{$(z[a1)}  +  0(n~3/2)  a.s. 


All  this  leads  us  to  propose 


eLOW  =  G_1{$(i[a1)}  <  B  <  G‘'1{$(^1"a])}  =  Byp 


(4.18) 


as  the  "final"  interval,  where 


;[l-«] 


:(1"a)  +  b 
_ o 

:  /./T\  /_(l-a) 


(4.19) 


Sn  1  -  (a  / v^n)  {z  a  +  b  /v^n} 
o  o 

and  correspondingly  for  zl  ,  where  b  / »^n  =  <!>  {G ( B)  } ,  and  where  a  / Sn  is  an 

o  o 

estimate  of  a  / /n ,  a  separate  and  difficult  problem  returned  to  in  the  following 


subsection. 

The  resulting  interval  (4.18)  does  not  depend  upon  the  transformation  g 

or  upon  b  ,  a  ,  and  is  Efron's  acceleration  and  bias  corrected  (ABC)  interval, 
o  o 

Observe  that  if  n  is  large,  then  z^  is  close  to  z^3  and  z^  is  close 
to  z^a\  i.e.  (4.18)  is  not  very  different  from  the  simple  percentile  interval 


(4.4),  which  in  4.1  was  found  to  be  first  order  equivalent  to  the  standard 
interval  (4.2).  (4.18)  purports  to  make  the  necessary  next  order  corrections. 
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S  S 


Each  of  the  proposed  estimates  of  4.3  has  the  property  that 


=  0  (n  ”*)  .  Also,  b  /«/ n  -  b  /v'rT  =  0(n 


a.s.  so  that 


entailing 


•ll-il  ,  ,11-al  +  0  („-l 
P 


Sjjp  =  G~1{<J>(2 [1-ai3)  +  0p(n-1)} 

=  G~1{<t(2[1‘a])}  +  Op(n_3/2) 

■  B0P  +  °p<I'"3/2>’ 

how  ■  how  +  °p(''"3/2>  • 


(A. 20) 


Suppose  that  the  normalising  transformation  g  is  nearly  perfect,  i.e. 
the  0(n  1)  term  of  (4.11)  is  practically  zero.  Statisticians  have  experienced 
wondrous  g  transformations  like  that  in  many  situations,  for  example  Fisher's 
tanh  2  transformation  of  the  correlation  coefficient  and  Wilson-Hilf erty ' s 
cube  root  of  the  chi  squared.  Then  the  pivotal  statement  (4.13)  is  in  force 
with  a  distribution  for  W  that  is  really  independent  of  the  unknown  parameters 
B  and  A,  and  the  interval  constructed  from  it,  h(y)  -  w^1  h{g(B)}  <_ 

h(y)  -  w^  would  be  the  correct  interval  in  a  strong  sense,  both  inferentially 
and  probabilistically,  cf.  the  discussion  following  (4,10).  What  we  have 
actually  shown  is  that  (4.18)  has  endpoints  coming  very  close  to  the  correct 
endpoints  g  ^[h  ^{h(Y)  -  w^3  a^}]  ,  g  3[h  3{h(y)  -  w^a^}],  namely  erring  by 

just  0  (n~3/2). 

P 

This  seems  to  be  Efron's  motivation  for  and  justification  for  the  ABC 
interval  (4.18).  A  perhaps  separate  issue  is  whether  (4.18)  is  second  order 
correct  in  the  sense  of  the  discussion  following  (4.10).  We  have  come  very 
close  to  establishing  (4.10)  too.  Remarks  (2)  and  (3)  below  explain  why  the 
key  assumptions  (4.11)  and  (4.15)  can  be  trusted,  and  these  assumptions  were 


shown  to  imply  that  8T_t,  <  B  <  B,_  had  the  prestigious  0(n  )  property.  The 


question  is  whether  Syp  *  Syp  +  0^(n  2^2)  implies  Pr^CByp  _<$}**  Prg^yp  1  8} 

+  0(n  S .  This  certainly  looks  reasonable,  and  should  be  true  under  very  mild 
conditions.  A  formal  proof  might  perhaps  need  the  existence  of  Edgeworth-Cram£r 
expansions  ror  */n  (Syp  -  8)/t  and  Sn  (@^p  ~  8)/t,  cf.  the  discussion  in  the 
beginning  of  Section  4. 

Remarks .  (1)  From  (4.16)  we  see  that  the  confidence  interval  just  as 

conveniently  can  be  given  in  terms  of  a  *  a0^y^'  and  b  =  b  /Jn  instead  of  aQ  and 

bQ.  We  used  a^  and  bQ  only  for  motivational  purposes  and  for  keeping  track  of 

the  various  oreders  of  magnitude  involved  in  the  discussion.  Indeed  a  ,  b  ,  and 

o  o 

g  are  allowed  to  depend  upon  n  too,  but  in  a  "stable"  manner. 

(2)  The  assumption  (4.11)  is  not  as  restricitive  at  it  may  appear  to  be. 

It  states  that  the  distribution  of  8,  to  a  second  order  approximation,  is  a 
scaled  normal  translation  family,  in  the  language  of  Efron  (1982b).  It  is  an 
implicit  result  of  Efron  (1985b)  and  an  explicit  result  of  DiCiccio  and  Tibshirani 
(1985)  that  such  a  transformation  always  exists  for  one-parameter  problems,  and 
also,  in  appropriate  senses,  in  multi-parameter  and  nonpar ^metric  models.  The 
precise  definitions  behind  these  "appropriate  senses"  are  (in  their  current 
formulation)  evasive  and  circumventive  in  nature,  however,  and  involve  least 
favourable  reductions  to  one-parameter  situations,  and  are  perhaps  not  entirely 
satisfactory.  We  shall  indeed  follow  Efron  (1985b)  and  DiCiccio  and  Tibshirani 
(1985)  and  employ  one-parameter  reduced  models  in  the  following  subsection, 
where  we  struggle  to  find  estimates  for  a  =  aQ//n,  and  essentially  use  the 
arguments  of  4.2  in  the  reduced  model.  Of  course  the  Cox  model  fits  neither  of 
the  categories  mentioned  above,  because  of  its  infinite-dimensional  nuisance 
parameter.  It  can  however  be  closely  approximated  with  one  with  finitely  many 
parameters,  as  explained  in  4.3  below.  On  these  grounds  assumption  (4.11)  can 


be  trusted . 


(3)  It  is  not  at  all  obvious  that  (4.11)  implies  the  other  key  assumption 
* 

(4.15).  H  and  H  are  close,  and  indeed  the  efforts  of  Section  3  establish 
n  n  ’ 

I  *  I 

supt  |Hn(t)  ~  (t)  I  -*•  0  a.s.  Needed  now  is  a  sharper  statement,  for  example 

i/n  sup  |H  (t)  -  H  (t)  I  0  a.s. 
t  '  n  n  1 

or  ideally  (pointwise)  a.s.  boundedness  of  njH^Ct)  -  (t) | . 

Results  of  Bickel  and  Freedman  (1980,  1981),  Singh  (1981),  and  Babu  and 

*  .  -1 

Singh  (1983)  indicate  that  indeed  H  (t)  ■  H  (t)  +  0(n  )  a.s.  A  lesson  learned 

n  n 

from  these  papers  is  that  such  results  should  not  be  taken  for  granted,  however, 
and  that  small  changes  in  the  definition  of  statistics  may  result  in  drastic 
asymptotic  differences.  For  example,  it  may  be  true  that  the  distribution  functions 

O  0*  T~~  A  r"  A  ^  a 

H  and  H  of  the  non-standardised  variables  /n  (y  -  y)  and  /n  (y  -  y) , 
n  n 

respectively,  have  Jn  sup^  |H^°(t)  -  H^0  (t)  j  *  0(log  log  n)  -*■  ®  a.s.  even  though 
suPt  | Hn ( t )  -  (t)  |  -*•  0  a.s.  This  effect  is  not  necessarily  visible  for 

worldly  sample  sizes,  however;  log  log  of  a  (US)  billion  is  4.50.  And  as  pointed 
out  in  Remark  (4)  below  only  pointwise  closeness  is  needed. 

The  cited  papers  employ  the  machinery  of  Edgeworth-Cramdr  expansions  and 
establish  the  validity  of  such  using  Taylor  expansions  of  characteristic  functions. 
Such  techniques  work  well  for  fully  observed  i.i.d.  variables  but  do  not  lend 
themselves  easily  to  the  present  situation,  due  to  the  presence  of  censoring  and 


covariates  and  the  implicit  definition  of  6.  There  are  nevertheless  reasons  to 

* 

believe  that  and  are  sufficiently  close.  In  the  notation  of  earlier 
sections 

Vn  (6  -  6  )  =  {-  —  1(8)  }-1  n_!s  U(B  ), 
on  o 

Sn  (8*  -§)«{--  1(8*)  r1  (n_!s  U*(8)  +  0  (n-^)}, 
n  J.  p 

_j*  *  „ 

where  the  martingales  n  U(8q)  ant*  n  U^(8)  have  very  similar  features.  They 


are  both  asymptotically  normal  with  the  same  variance  I  according  to  Sections 


2  and  3.  Furthermore,  with  a  considerable  amount  of  effort  one  may  show  that 

*  .  r—  * 


they  have  skewnesses  y,  / ,  y,  /^n  and  kurtosises  y_  /n,  y„  /n  satisfying 

x  y  n  X  y  n  /  jH  ^  j n 


Tl,n  *  V  Tl,n  *  Yl'  y2 ,n  -  r2'  Y2,n  *  Y2  for  V  V 


These  facts,  combined  perhaps  with  techniques  as  in  Bhattacharya  and  Ghosh  (1978), 


Hall  (1983a)  .Withers  (1983),  open  up  for  Comish-Fisher  expansion  type  study 


of  the  closeness  of  H  to  H  .  This  is  not  pursued  here. 

n  n 

(4)  Let  us  point  out  the  extent  to  which  the  basic  assumptions  (4.11)  and 


(4.15)  were  used  in  the  construction  of  the  super  bootstrap  interval  (4.18)  and 
the  verification  of  its  second  order  correctness.  (4.11)  was  used  to  get  to  the 
pivotal  statement  (4.13)  and  the  a  and  1  -  a  points  of  W  appearing  there.  It  is 
only  necessary  that  (4.11)  holds  for 


t  =  z(a)  -  b  /  Jn  +  0(n  1)  and  for  t  =  z^ 


-  b  /✓n  +  0(n  ^)  ,  i.e.  roughly  only  for  the  two  points  -z^  and  z^  , 


However,  the  O-term  appearing  in  (4.11),  or  in  (4.12),  must  be  uniform  for  6  in 
a  neighbourhood  of  the  reference  point  8  • 


(4.15)  was  used  to  get  estimates  of  b^/^i  and  for  G(8LQW)  ,  G(8ITp). 


UP 


Precise  estimates  of  (t)  were  needed  only  for  t  =  0  (around  which  the  approxi¬ 


mations  work  best),  for  t  =  (z^  +  b}/{l  -  a(z^  +  b)},  and  for  t  *  {z^  +  b) 


/{I  -  a(z^  +  b)},  i.e.  roughly  only  for  the  three  points  0,  -z^  a\ 


But  again,  the  almost  surely  statement  in  (4.15),  which  here  can  be  read  as 


"conditioned  on  8  being  very  close  to  0Q"»  must  be  uniform  over  a  neighbourhood 


of  8  values. 


(5)  The  proposed  confidence  interval  (4.18)  has  been  given  in  terms  of 


c-1 


G  where  G  is  the  bootstrap  distribution  (1.13).  Of  course  G  is  in  reality 


only  approximated  as  in  (1.14),  requiring  BOOT  evaluations  of  8  •  The  investigation 


of  Efron  (1985b,  Section  8)  indicates  that  BOOT  =  1000  is  a  rough  minimum. 


(6)  An  important  final  comment  in  this  subsection  is  that  there  are 

other  possible  approaches  to  second  order  correct  intervals.  If  we  think  of 

the  "right"  interval  as  having  endpoints  8  -  z ^  ^r/Jn  +  A  ^°Vn,  6  +  z^  °^t 

/Jn  +  A  then  the  problem  is  to  get  hold  of  the  second  order  coefficients 

A  ^  and  A  ^  a\  and  it  is  clear  that  many  different  methods  could  manage 
n  n 

this,  in  the  same  way  as  there  always  are  a  variety  of  consistent  estimators 
for  a  given  statistical  parameter. 

The  route  chosen  in  this  paper  has  been  the  one  invented  in  Efron  (1985b) , 
but  carried  out  in  a  semi-parametric  model.  There  are  also  variations  on  the 
bias  and  acceleration  corrected  bootstrap  interval  (4.18).  It  may  be  possible 
to  devise  approximations  to  bQ  and  aQ  that  require  less  computing  than  the  ones 
proposed  here.  Another  variation  could  use  ideas  from  DiCiccio  and  Tibshirani 
(1985),  involving  the  explicit  construction  of  a  smooth  transformation  g  having 
property  (4.11).  One  such  g  consists  of  a  variance  stabilising  mapping  followed 
by  a  skewness-reducing  transformation.  Possessing  g  one  could  evaluate  the 
interval  (4.14)  directly. 

The  perhaps  most  natural  tools  from  classical  statistics  with  which  to 

fine-tune  large  sample  results  are  expansions  of  the  Edgeworth-Cram^r  and 

Cornish-Fisher  type.  The  key  idea  would  be  to  "remove  skewness"  in  one  fashion 

or  another.  (It  turns  out  in  4.3  below  that  the  acceleration  a  is  connected 

o 

to  the  skewness  of  a  certain  log-likelihood.)  Methods  using  related  ideas 
(expansion  of  log-likelihoods)  were  put  forward  in  early  important  papers  by 
Bartlett  (1953a, b)  and  in  an  unpublished  report  by  Tukey  (1949).  A  recent 
reference  is  Abramovitch  and  Singh  (1985)  who  use  Edgeworth-Cramer  methods  to 
construct  better  intervals,  both  "classical"  (removing  skewness)  and  based  on 
the  bootstrap.  Other  references,  mostly  concerned  with  the  i.i.d.  nonparametric 
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case,  are  Hall  (1983a)  and  Withers  (1983). 

Cox  (1980),  Sprott  (1973,  1980),  Barndorf f-Nielsen  (1985),  and  DiCiccio 
(1984)  use  saddlepoint  approximation  techniques  to  obtain  second  order 
corrected  approximations  to  the  distribution  of  the  maximum  likelihood 
estimator,  and  construct  intervals  based  on  these. 

If  one  could  effectively  estimate  the  skewness  of  the  distribution  of 
8  directly,  the  chi  squared  approximations  as  in  Hall  (1983b)  would  be  an 
attractive  alternative. 

Still  another  and  seemingly  unrelated  method  uses  a  Bayesian  framework. 
If  one  uses  as  endpoints  the  lower  and  upper  a-point  calculated  from  an  a 
posteriori  distribution  for  $,  then  this  interval  can  be  second  order  correct 
(in  the  frequentist  sense  adhered  to  here)  for  a  cleverly  chosen  a  priori 
distribution  for  the  parameters  6,  a(.).  Such  an  approach  is  investigated  in 
general  terms  in  Welch  and  Peers  (1963) ,  Welch  (1965) ,  Peers  (1965) ,  and 
recently  extended  and  clarified  by  Stein  (1985).  Stein's  paper  is  written 
"non-rigorously  but  with  some  care".  Using  such  an  approach  in  the  present 
situation  would  involve  constructing  the  a  priori  distribution  as  a  solution 
to  a  differential  equation  (derived  for  the  approximating  finite-dimensional 
model  also  studied  in  4.3  below)  and  then  carefully  checking  that  each  in  a 
long  row  of  approximations  involved  in  Stein's  arguments  is  of  sufficient 
precision.  The  a  priori  distribution  would  be  improper,  but  different  from 
the  one  derived  from  Jeffreys'  non-lnformativity  principle. 


Let  us  mention  a  final  possibility.  One  may  invert  the  likelihood  ratio 


tests  for  8  *  8q,  using  the  limiting  approximation.  More  specifically. 


consider 


D  (6  )  -  -  2  log  L(8  )/L(8) 
no  o 


n  T  a  S(0)(s,6) 

*2  £  /  {(8  -  8  )z.  -  log  — rjyr - }  dN  (s), 

i-1  0  01  S  °  (s,8  )  1 

o 

where  L  is  the  partial  likelihood  given  in  (1.4).  One  can  show  that  Dn(BQ) 

tends  to  a  ](|  in  distribution  under  8Q>  even  if  the  likelihood  only  is  the 

partial  one,  using  results  of  Andersen  and  Gill  (1982).  This  can  of  course 

be  used  to  test  8=8,  and  a  natural  interval  is 

o 

I  =  (8  :  D  (8  )  <  (z(L-C°)2}. 
o  no  — 

That  I  really  is  an  interval  follows  from  log  concavity  of  L. 

I  is  a  natural  interval  since  it  is  based  on  a  natural  and  perhaps 
even  optimal  family  of  tests.  One  can  also  give  arguments  in  the  direction 
of  showing  that  it  is  second  order  correct  in  the  sense  discussed  after 
(4.10).  Peter  Bickel  has  pointed  out  that  a  Bartlett  correction  factor 
could  lead  to  even  more  precise  intervals,  with  coverage  probability  1  -  2a 
+  0(n~3/2). 

Finding  I  in  practice  would  be  a  difficult  but  solvable  numerical 
problem.  The  situation  is  less  clear  when  the  problem  is  finding  a  second 
order  correct  confidence  interval  for  one  out  of  p  relative  risk  parameters 


in  the  multi-parameter  Cox  model. 


There  is  perhaps  a  common  theme  underlying  the  various  approaches, 
since  nearly  all  of  them  involve  expansions  of  likelihoods  in  one  form  or 
another,  but  this  theme  is  at  present  not  fully  understood. 


4.3.  Computing  the  acceleration  constant. 

The  only  quantity  left  to  specify  in  the  proposed  confidence  interval  (4.18) 

is  the  estimate  a  =  a  //n  of  the  acceleration  constant  a  ■  a  /*n.  Efron  (1985b) 

o  o 

provides  formulae  for  a  in  multiparameter  and  nonparametric  models.  The  Cox 
model  under  study  has  however  an  infinite-dimensional  nuisance  parameter.  Although 
a  direct  approach  is  possible,  working  only  in  the  "continuous"  Cox  model,  we 
shall  below  approximate  it  with  one  with  only  finitely  many  parameters,  find  an 
appropriate  value  for  a  in  this  model,  and  afterwards  take  a  "fine  limit"  to 
get  back  to  Cox. 

The  traditional  statistical  analysis  of  Cox'  model  starts  out  with  the 

partial  likelihood  (1.4).  The  real  or  full  likelihood  can  also  be  written  down, 

but  involves  the  hazard  rate  a(.): 

n  T 

L(8,  a(.)}  «  n  exp  [f  {log  a  (s)dN  (s)  -  Y  (s)a.(s)ds)], 

i  =  i  0  11  11 


i.  e. 

n  T 

log  L{£5,  a( . ) }  =  E  /  [{log  a(s)  +  Sz.JdN  (s)  -  Y  (s)exp(Bz  )a(s)ds] .  (4.21) 

i_l  0  1  i  1  i 


L{8,  a(.)}  is  in  fact  (proportional  to)  the  Radon-Nikodym  derivative  of  the 
probability  mechanism  governing  the  (N^,  Y^)  processes  under  (B,  a(.))  w.r.t. 
a  product  of  simple  Poisson  processes,  see  e.g.  Br&naud  and  Jacod  (1977,  Sections 
2.7  and  3.5)  or  Aalen  (1978,  Section  3). 

Let  us  approximate  the  above  model  with  one  where  a(.)  is  taken  constant 
on  each  of  many  small  intervals.  Split  [0,  T]  into  m  such  intervals  with  mid¬ 
points  Sj  and  lengths  ds^  ,  and  let  a(s)  =  a(s^.)  for  s  in  interval  no.  j.  The 
resulting  model  has  m  +  1  parameters  and  can  be  handled  in  Efron's  (1985b) 
framework.  The  intention  is  to  let  m  tend  to  infinity  and  the  max  mesh  to  zero 
after  the  necessary  intermediate  analysis  (and  n  is  fixed).  We  shall,  more  for 
notational  and  technical  convenience  than  out  of  necessity,  assume  that  Y.  is 

constant  on  each  of  the  m  small  intervals.  dN.(s.)  denotes  the  increase  of 

i  J 

counting  process  over  interval  no.  j  (and  is  0  or  1). 


30 


First  we  need  maximum  likelihood  equations  and  information  matrix  in  the 
reduced  model.  Write  L(m)  -  L<m)(e]D)  for  the  (full)  likelihood  in  the  approxi¬ 
mating  model,  i.e. 


log  L 


(m)  _ 


n  m 


Z  Z  [{log  a(s . )  +  Bz.}dN  (s.)  -  Y  (s  )exp(8z  )a(s.)ds  ], 
i-1  j-1  J  1  1  J  1  J  1  J  J 


(4.22) 


where  9  *  (6,  a(s, ) , . . . ,a(s  ))  is  the  parameter  and  D  =  {(N.(s),  Y,(s))  ;  s  <  T} 
l  m  l  i  — 

is  the  data.  Calculations  give 


31og  L 
38 


(m)  n  m 


Z  Z  {z  dN  (s.)  -  Y  (s  )z  exp(8z  )ot(s.)ds  } 
j»i  1  1  J  i  j  i  i  3  j 


n  T 


=  Z  f  z  {dN  (s)  -  Y  (s)exp(8z  )a(s)ds}, 

j n  *  ^ 


31og  L 


i«l 


(m)  n 


\n(~  \ —  ■  2  {dN  (s  )/a(s.)  -  Y  (s  )exp(Bz  )a(s)ds}, 

i=l  J  J  ^  J 


32log  L 


»  dN(s.)/a(s  J  -  S^Q\sy  8)ds^  ; 
(m)  n  m 
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Z  Z  Y  (s  )z  2exp(8z.)a(s  )ds. 
i=lj  =  l  1  3  1  12 


=  -  n  /  S^(s,  6)a(s)ds, 
0 


383a(s~ )  =  -  Yi(sj)Z.exp(6zi)ds.  =  -  nS(1)(sjS  8)dSj, 


32log  L 


J 

(m) 


3a(si)3a(s|2) 


dN(s.)/a(s.)2  6.p . 
J  J  3* 


*  is  used  to  indicate  the  result  evaluated  back  in  the  Cox  model,  i.e.  after 

having  m  sent  to  infinity.  It  follows  that  the  maximum  likelihood  estimators 

8,  a(s. ) , . . . ,a(s  )  satisfy 
i.  m 


a(s ,  )ds 


dN  ( s_)  /  n 


2  2  (s . ,  8) 


n  m 


Z  Z  z  {dN  (s.)  -  Y  (s . )exp(8z . )a(s . )ds . }  =  0. 
i  =  l  1  1  J  1  J  J  3  3 


n  T 


The  l.h.s.  of  the  last  equation  *  Z  ^  / q  {z^  ~  E(s,  8)}dN^(s),  where  we  now 


wr  ite 


E(s,  6)  -  S(1)(s,  8)/S(0)(s,  8).  (4.23) 

It  is  encouraging  to  notice  that  8  obtained  in  this  way,  after  letting  m  tend 

A 

to  infinity,  is  identical  to  the  usual  Cox  estimator  8,  cf.  (1.5);  indeed  we 
shall  not  distinguish  between  them  in  what  follows.  A  similar  remark  applies 
to  the  cumulative  estimated  intensity  A,  compare  (1.8).  This  can  be  taken  as 
additional  credit  to  the  partial  likelihood  approach  and  is  related  to  earlier 
findings  of  Johansen  (1983)  and  Bailey  (1984). 

The  observed  information  matrix  is 


I11  I12 
•X21  *22 


with  elements 


I  =  n  /  Sv^'(s,  8)dA(s), 
11  0 


I12  =  {nSw(.JiB)dsJ}J^f 

i22  *  diag  (dN(Sj)/a(s^)2;  j  £  m}. 


(4.24) 


Next  we  need  Stein's  (1956)  notion  of  a  least  favourable  one-parameter 

A  A  A  A 

family  for  8  at  the  fixed  parameter  point  9  =  (8,  o(s, ) , . . . ,o(s  )).  The  least 

1  m 

favourable  direction  at  this  point  is  p  ■  I  v,  where  v  is  (3y/36,  3y/3a(s^) , 

...,3y/3a(s  ))'  evaluated  at  0,  and  where  y  =  y(9)  is  the  parameter  of  particular 
m 

interest.  Here  y  is  just  8  and 

- f  l  )  f  i11  1 


y  =  I 


i21  ' 


(4.25) 


Well  known  matrix  formulae  and  some  algebra  yield 

-11  -  -  -  _i  -  _i  i  -_i 

I  =  (I  -I  I  A  i  )  A  =  —  y  1 

Vill  12  22  i21;  n  L  ’ 

i21  -  -  L,  i11  =  {-  E(Sj,  6)  a(s . )  -  E-1}_  . 


(4.26) 

(4.27) 


J  J  ' \r  ■'  ■  -• 


•7  «r r1  v v\*»  v?: v" r. v „-v ■„% l.v ^ 


A  <« 

Stein's  least  favourable  family  is  the  one  passing  through  0  in  the  direction  u, 

F  -  {L(m\t|D)  h  L(m)(0  +  xG|d)}.  (4.28) 

t  is  in  a  neighbourhood  of  0  and  is  the  only  unknown  parameter  here;  D 
denotes  a  hypothetical  new  collection  of  data  {(N^(s),  Y^(s));  s  <_  T}  drawn  from 
L  (0  +  xu | D) .  The  Fisher  information  bound  for  an  estimate  of  6(x)  =  8  +  xl  , 
evaluated  at  t  =  0,  is  the  same  as  the  bound  for  estimating  8  in  the  original 
m+1  parameter  family,  evaluated  at  0.  F  is  the  only  reduction  to  a  one-parameter 
family  where  the  problem  of  estimating  8  does  not  become  artificially  simpler. 
See  also  Tibshirani  and  Wasserman  (1985). 

Efron's  formula  for  an  estimate  of  the  acceleration  a  =  a^m^  is 


;(m) 


7  SKEWTmQ  log  L(m)(0  +  xy|D) ). 


(4.29) 


The  result  after  differentiating  log  l/m^{8  +  xl^\  a(s^)  -  xE(s^,  B)a(s^) 

a  A  ^  ^  '■“H  A  A 

...,  a(s  )  -  xE(s  ,  8)n(s  )  I  }  w.r.t.  t  and  putting  x  *  0,  where  E(s,,  8) 
m  m  m  j 

is  (4.23)  evaluated  for  8=8  and  with  the  "new  data”  D,  is  the  variable 


;(m)  _  -11 


n  m 


=  I  l  l  { z  -  E(s . ,  8)}  (dN  (s  )  -  Y. (s  )exp(8z  )o(s.)ds. }.  (4.30) 
Kj=l  1  3  1  3  13  3  3 

a(m)  *  i  skew  {v(m)}  can  now  be  evaluated  in  an  explicit  way,  using  ("time- 

D 

discrete")  ideas  from  the  proof  of  the  ("time-continuous")  lemma  in  the  Appen¬ 
dix.  We  are  more  interested  in  a,  the  limit  of  a^m^  as  m  tends  to  infinity.  One 

may  prove  that  a  =  \  SKEW  {v}  where  V  is  the  limit  variable 
b 


i  t-i 


n  T 


V  -  -  Z  l  /  { z,  -  E  (s, 8) )  (dN.  (s)  -  Y.  (s)dA.(s)}, 
n  n  1  i  i  i 


(4.31) 


i»l 


it  it 

where  N^  ,  Y^  now  denote  hypothetical  data  drawn  from  the  distribution  with 
cumulative  hazard  A^  of  (1.11),  i.e.  exactly  equivalent  to  coming  from  a 

*  *  *  *  it 

bootstrap  sample  (X^  ,  5^  ),...,(X  ,  <5n  ) ,  compare  (1.12),  (3.1).  E  (s,8) 

n  *  .  n  *  fl>*  (03* 

here  denotes  (s)exp(8zJ/E^_^Y^  (s)exp(8Zj)  =  Sv  (s,B)/Sv  (s,B). 


i-  •/. . 
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One  possible  method  of  computing  a  is  therefore  to  exploit  the  generated 


bootstrap  data  samples  (even  more).  Each  of  the  BOOT  bootstrap  samples  leads 

-  ,*b 

not  only  to  a  realisation  of  6  but  also  to  a  realisation  of  V,  say  V  .  This 

V  could  for  example  be  computed  at  the  beginning  of  the  numerical  routine  that 

a  it  a 

finds  B  ,  compare  (3.3).  Now  use  the  empirical  skewness  of  these  BOOT  V-values 
as  an  estimate  of  the  skewness  of  V,  i.e. 

a  =  -r  SKEW  {V  , . . . ,V  }.  (4.32) 

D 

Of  course  the  constant.  “  ^  ^  may  be  removed  before  computing  the  skewness 

The  scheme  above,  taking  advantage  of  the  data  and  the  model  using  raw 

computing  power,  is  in  the  true  bootstrap/meat  axe  spirit,  and  does  not  require 

any  theoretical  knowledge,  however  interesting,  of  for  example  the  skewness  of 

a  martingale.  If  Statistician  A  followed  the  procedure  above  to  fill  in  his 

value  for  a  *  a  /Sn  in  the  confidence  interval  (4.18),  then  he  has  done  his 
o 

job  and  has  the  (second  order)  right  not  to  be  interested  in  an  explicit 
formula  Statistician  B  may  have  worked  out  for  a. 

The  present  author  will  nevertheless  join  forces  with  Statistician  B 
and  proceed  working  on  alternative  approaches  to  the  computation  of  a.  Explicit 
(or  less  implicit)  expressions  are  always  valuable,  and  can  here  lead  to  a 
better  understanding  of  the  acceleration  factor,  but  the  prime  reason  for 
carrying  out  the  reasoning  below  is  that  the  formulae  that  are  obtained  are 
of  use  also  in  other  statistical  problems. 

Instead  of  V,  consider 

n  T 

r 

i-1 

V  is  just  a  constant  times 


V  ■  —  I  *  I  f  fz,  -  E(s,B  )}  {dN  (s)  -  Y  (s)exp(B  z  )dA(s)}.  (4.33) 

n  ,,qi  o  l  l  oi 
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(4.34) 


U-—  r  /  {*  -  E(s,8  )}  dM.(s),  (4.34) 

&  i-1  0  1  0  1 

employing  once  more  the  martingales  (2.1).  a'  »  \  SKEW  { V}  is  considerably 

o 

easier  to  give  a  formula  for  than  is  a  |  SKEW  {V},  and  it  will  also  be 

easier  to  construct  estimates  a'  for  a'  with  sufficient  precision.  The  results 

of  the  Appendix  can  be  shown  to  imply  that  for  the  proposed  a'  displayed  later 

a'  -  a  =  0(n  ^)  a.s.,  or  a' -a  =  /n  a'  -  a  m  0(n  2)  a.s.,  which  is 

oo 

good  enough,  compare  (4.19)  and  the  arguments  following  it.  Using  a'  instead 
of  the  a  that  would  result  from  explicit  moment  evaluations  saves  us  from  a 
large  amount  of  terms  like  {1  -  AA^s) }  3AA^(s)  -  AA^(s)3{1  -  AA^(s)}  etc., 
cf.  (3.8).  The  a-ingredient  chosen  for  citation  in  the  preceding  sentence 
will  in  its  a'  version  be  simply  exp(8z^)dA(s) . 

An  exact  expression  for  a'  is  obtained  as  follows,  using  formulae  for 
the  second  and  third  moments  to  be  found  in  the  Appendix.  One  has  EU2  =  ER^ 
and  EU3  *  ER^/i/n  where 


-  n  T 

R,  »  —  I  /  { z  -  E(s, 6  )}2  Y  (s)exp(S  z  )dA(s) 
2  n  ,  i  o  i  o  i  oi 


=  ;  (S(2)(s,e  )  -  S(1)(s,8  )2/S(0)(s,e  ) }dA(s) , 


(4.35) 


1  n  T 

R,  =  —  1  /  (z.  -  E(s,8  )}3  Y.(s)exp(S  z  )dA(s) 

J  n  .  ,  n  i  o  i  oi 


T  t  n 


Hence 


>  1  w  u  u 

+3  —  f  f  I  (z  -E(s,S  )}dM  (s)  I  (z  -E(t,8  ) }2Y. (t)exp(8  z.)dA(t). 
n  0  0  i=1  i  o  i  j  o  j  o  j 


(4.36) 


a'  =  ER3/{6»^  (ER  )  '  }. 


(4.37) 


There  are  a  number  of  ways  Co  proceed  to  get  hold  of  (a  version  of) 


a'.  One  might  stick  to  exact  expressions,  using  for  example 


EY.(s)  =  Pr{X°  >  s,  c.  >  s}  =  exp{-A(s)exp(3  z.)}  I{c.  >  s}, 
i  i  —  i  —  oi  i  — 

and  arrive  at  long  and  complex  expressions  for  ER0  and  ER^.  Then  one  should 
plug  in  S  and  A  for  6q  and  A  where  necessary  and  compute  the  whole  thing. 

(The  formula  above  applies  when  the  censoring  time  c^  is  known;  in  the  random 


censorship  model 


EY.(s)  =  exp{-A(s)exp(B  z.)}  R[s,°°) 


is  appropriate,  where  R  denotes  the  distribution  of  censoring  times,  and  for 


which  a  Kaplan-Meier  estimate  is  available.) 


Or  one  might  use  some  approximations.  It  is  demonstrated  in  the  Appendix 

A  A  a 

that  R„  =  a2  =  ER  +  0  (n  2) ,  so  that  o2  is  a  good  enough  estimate  of  EU2 , 

2  2  p 

...  -k 

and  that  R_  =  R_  +  R_"  is  within  0  (n  )  of  ER.,  where 
3  3  3  p  3 

1  n  T  .  .  . 

R  '  =  —  £  /  { z.  -  E(s, 3) ) 3  Y . (s) exp (Bz.)dA(s) ,  (4.38) 

3  n  i=  0  1  1  i 


„  nTt 

R  "  =  -3  —  l  f  f  {z . -E(s , 3) }exp(Sz . )dA(s) {z . -E(t , 3) }2Y . (t)exp (Bz . )dA(t) . 
3  n  i=l  0  0  1  1  1  1  x 

(4.39) 

It  follows  that 


a 


(4.40) 


6/n  a3 

is  within  0  (n  ^)  of  a'  =  —  SKEW  (U)  =  -r  SKEIJ  {V}  and  also  within  0  (n 

p  6  6  p 

of  1/6  times  the  skewness  of  U  evaluated  at  the  maximum  likelihood  estimates 

.  . 

3  and  A,  and  finally  within  0  (n  )  of  a  computed  as  in  (4.32). 

P 

The  confidence  interval  (4.18)  obtained  by  inserting  a'  above  is  there¬ 
fore  also  second  order  correct.  Variations  exist.  Since  a  is  computed  in  any 
case  and  since  R  '  also  is  easy  and  natural  to  compute  from  the  data,  in  that 


it  provides  a  measure  of  how  skew  the  distribution  of  covariate  calues  z^ 
is  in  the  actual  experiment,  one  might  consider  keeping  these  in  the 
formula  for  a'  but  bootstrapping  to  get  hold  of  an  equivalent  version  of  R^”, 
viz. 


T  t  n 


R,"  =3  -  f  f  l  {z.-E  (s , 8) }dM  (s)  E  {z  -E  (t,8)}zY.  (t)exp(Bz4)dA(t) 

3  n  0  0  i=1  1  1  j=1  J  J  j 

compare  (4.36),  (4.39). 


There  is  basically  a  choice  between  the  bootstrap  based  a  of  (4.32), 
requiring  a  large  number  of  computations  of  the  variable  V  of  (4.31),  and 


the  explicit  formula  a'  involving  and  R^”.  The  latter  choice  displays 


the  oldfashioned  un-bootstrap  feature  of  consistently  estimating  a  population 
parameter  in  an  explicit  way,  but  the  first  choice  may  still  be  the  most 
practical  one,  given  that  the  statistician  has  decided  to  generate  bootstrap 
samples.  A  final  reason  for  working  out  the  a’  formula  is  that  one  conceivably 
might  construct  approximations  to  the  bootstrap  distribution  G  itself 
without  bootstrap  samples,  perhaps  using  Edgeworth-Cramer  expansions  techniques 


or  perhaps  using  a  method  similar  to  one  invented  in  DiCiccio  and  Tibshirani 


5.  Concluding  remarks. 


(1)  It  was  shown  in  Section  4.2  how  the  assumptions  (4.11),  (4,15)  led 
to  the  second  order  correctness  of  the  confidence  interval  (4.18).  General 
difficulties  with  multiparameter  families  caused  however  the  somewhat  evasive 
treatment  of  the  acceleration  factor  a  presented  in  Section  4.3,  where 
recourse  was  taken  to  a  certain  least  favourable  one-parameter  family.  Thus, 
following  Efron's  (1985b)  construction,  we  rather  employed  the  equivalent 
local  version  of  (4.11)  for  this  least  favourable  reduction.  The  arguments 

of  Section  4.2  easily  carry  over  to  the  reduced  model. 

Part  of  the  problem  is  the  difficulty  of  obtaining  a  proper  and  universally 
agreed  upon  definition  of  a  second  order  correct  interval  in  the  presence  of 
nuisance  parameters.  We  may  speculate  with  Efron  (1985b)  that  given  a  suitable 
and  sensible  definition,  (4.18)  will  indeed  be  second  order  correct.  More  work 
is  needed  in  this  area.  A  study  of  transformations  to  approximate  normality 
in  multi-parameter  models,  parallelling  and  extending  the  work  of  Efron  (1982b), 
would  be  welcomed . 

(2)  The  first  half  of  Section  4.3  was  concerned  with  the  problem  of 
obtaining  the  least  favourable  one-parameter  family  in  the  Cox  model,  considering 
ot(.)  as  an  infinite-dimensional  nuisance  parameter.  The  problem  was  solved 

via  intermediate  analysis  in  a  finite-dimensional  approximating  model.  One 
can  also  obtain  the  same  result  separately  and  more  directly,  working  only 
in  the  time-continuous  Cox  model,  using  a  suitable  extension  of  Stein's  notion 
of  a  least  favourable  reduction  for  models  with  infinite-dimensional  nuisance 
parameters.  Such  an  analysis  is  indeed  provided  by  Begun  et  al.  (1983,  Section  6) 
(with  slightly  different  assumptions) .  Both  approaches  are  valid;  there  is  only 
the  recurring  problem  of  "where  to  put  the  hard  part". 


(3)  The  most  important  extension  of  the  model  studied  in  Sections  1-4 

is  the  traditional  p-variate  Cox  model,  in  which  individual  no.  i  has  covariates 

zi  »  (z^  ^.....z^  )'  and  hazard  rate  a^s)  =  a(s)exp(0'z »  a(s)exp(6^zi  ^  + 

...  +  8  z.  ).  Most  of  the  discussion  of  earlier  sections  goes  through  for 
P  l.P 

the  p-variate  model,  with  only  minor  modifications.  In  particular  the  basic 

A  A 

bootstrap  scheme  is  as  in  (1.10)  -  (1.12),  only  with  8, z  +  ...  +  8  z. 

1  i,i  p  i,p 

replacing  8z..  There  are  now  p  bootstrap  distributions  G. (t)  =  Pr  (8.  <  t). 

1  i  *  i 

There  are  p-variate  versions  of  Sections  2  and  3,  involving  at  crucial 
points  a  multivariate  martingale  central  limit  theorem.  Involved  in  this  is 
a  p  x  p  covariance  matrix  E  with  elements 


a  =  /  {s.(p)(s,6  )  -  s.(1)(s,B  )s„(1)(s,B  )/s(0)(s,8  )}dA(s), 

jVgjKQj  O*  O  O 


(5.1) 


cf.  (2.6),  where  s^^(s,Bq)  and  sjg\s,8o)  are  the  limits  in  probability  of 

respectively  S^  '(s,8)  -  -  JYi(s)exp(8,zi)  and  S^g  (s,8)  *  -  E^^  g 

Yi(s)exp(8'zi) .  Also  involved  is  the  natural  analogue  of  (2.9),  a  consistent 

estimator  E  with  elements  o^g .  After  working  through  the  details  one  arrives 

at  an  asymptotic  (first  order)  justification  of  the  bootstrap  procedure,  and 

-  _1  -  _i 

in  particular  it  may  be  verified  that  (a)  _<  6^  ^  G^  (1-a)  is  asymptotically 
equivalent  to  the  traditional  large  sample  theory  based  confidence  interval 
for  8i>  namely  8^  -  z^  (aii)"S/i^n  _<  8  _<  6^  +  z^1  (oii)^//n,  writing 

as  usual  and  for  the  elements  of  respectively  E  ^  and  E  ^ . 

Suppose  a  second  order  correct  interval  is  sought  for  the  parameter  8^.  It 


can  be  demonstrated  along  the  lines  of  Section  4.2  that 

Gi"1{*(z‘i[ct])}  <  <  G1_1{<Kz1tl~a]) 


(5.2) 


is  such  an  interval,  where 
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L-a 


and  similarly  for  z^  ,  and  where  J Vn  -  4>  {G^(B^)}.  It  remains 

only  to  find  the  appropriate  acceleration  factor  a^  *  a^ 

The  treatment  of  this  problem  in  the  univariate  case  of  Section  4.3  can 
be  parallelled.  The  approximating  finite -dimensional  model  with  a  likelihood 
analogous  to  (4.22)  has  p  +  m  parameters.  Skipping  many  details,  one  arrives 
at  the  equivalent  of  (4.31), 


\  -  ~  I  f  K1>;L(s,B)  (dNi"(s)  -  Yi"(s)dA1(s)}, 


(5.4) 


where 


K,  ,(s,6)  -  I  alu  (z,  -  E  *(s ,6)}, 

l,i  -  i,u  u 

u*l 


(5.5) 


E  (s,B)  =  SfU;  (s,8)/SW  (s,8)  =  I  z.  Y.  (s)exp(B'z.)/  E  Y.  (s)exp(e’z.) 

U  U  ^  1  >  u  X  X  1 

(5.6) 

As  in  Section  4.3  there  are  at  least  two  ways  to  proceed  to  get  hold  of 
a^  *  ^  SKEW  {V  },  or  another  second  order  equivalent  version  lying  within 
0^(n  ^)  of  a^.  One  effective  method,  given  that  the  statistician  has  decided 
to  generate  bootstrap  samples  in  the  first  place,  compare  the  discussion  in 

»  *b  A 

Section  4.3,  is  to  evaluate  bootstrap  replicates  along  with  the  8^  '  s, 

.  A 

and  use  1/6  times  the  empirical  skewness  of  these  values  for  a^.  Another 
possibility  is  to  use  an  explicit  formula.  Consider 


V  «—  E  /  K.  . (s , 8  )  {dN.(s)  -  Y.(s)exp(8  ’z.)dA(s)}, 
1  SZ  i-1  0  1’i  °  1  1  0  i 


(5.7) 


K.  . (s, 8  )  =  E  a  {z.  -  E  (s,B  )}, 

X>X  O  a  1 «u  u  o 

u*=l 


(5.8) 


writing  Eu(s,8)  *  S^^  (s , 8) /S ^  (s  ,8) .  The  plan  is  to  evaluate  the  skewness 
of  V,  at  the  maximum  likelihood  estimators  6,  A. 


w 


Some  matrix  algebra  combined  with  results  from  the  Appendix  can  be  used 

■>11  -51-11  .  -k  -11  ,  -h 

to  show  that  EV.  2  m  a  +  0(n  )  *  o  +  0  (n  ^) ,  i.e.  a  *  EV.2  +  0  (n  ^) . 

l  p  1  p 

Also  needed  is  EV^3,  expressions  for  which  are  obtainable  via  the  methods  of 
the  Appendix.  The  result  is  that 


'  «  „  _L_  !Ll£  * 

1  6v£  (cn)3/2 


a.  +  0  (n  1)  , 
1  P 


(5.9) 


where 


n  T  p 


R  "  «  -3 
1,3 


E  / 

[  E  o 

(z, 

i=l  o 

u=*l 

i,u 

1  n 

T  t 

,  l 

-  E 

/  ; 

[  E  o 

n  j  i 

i=l 

0  0 

u-1 

(5.10) 


[  E  alu{z1  u-Eu(t,6) )]2exp(B,zi)Y1(t)dA(t) . (5.11) 
u*l  * 


(4)  The  formulae  derived  in  the  Appendix  for  the  skewness  of  a  martingale 
have  use  also  in  other  problems.  Let  us  briefly  point  out  two  applications. 

(4a)  Consider  first  a  parametric  Cox  model,  where  the  underlying  common 
a(.)  is  assumed  constant  over  the  time  interval  [0,  T] ,  say  a^(s)  *  0exp(8zi) . 
The  log  likelihood  for  this  model,  provided  the  data  are  collected  over  this 
interval,  for  example  in  the  form  of  counting  processes  and  at-risk  indicator 
Y^  as  in  previous  sections,  becomes 

n  T 

log  L(8,0)  =  E  /  {(log  9  +  Sz  )dN  (s)  -  Y  (s)0exp(6z  )ds).  (5.12) 

j_=  ]_  0  11  1  1 

The  maximum  likelihood  estimators  8,  0  solve  the  equations 


31og  L 
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n  T 

E  /  z,{dN  (s)  -  Y  (s)9exp(Bz  )ds)  =  0, 

1=1  0  1  1  i  i 


1 

E  f  —  (dN  (s)  -  Y  (s)9exp(8z.)ds)  =  0. 


v~' 


'Vf  C* 


rj  JT.  V",  ^  r  -■ 


0  and  9  are  asymptotically  optimal  estimators  by  the  results  of  Hjort  (1985). 

o* 

Bootstrapping  can  be  performed  as  follows:  Generate  X^  from  the 
estimated  distribution  F^,  exponential  with  parameter  9ftxp(8z^),  and  let 
X^  *  min{X°  ,  Cj},  6^  *  l{X°  _<  c^ ) .  This  leads  to  and  Y^  as  in  (3.3), 

a  a  ^ 

i  *  l,...,n,  and  then  to  bootstrap  replicates  8,9.  Let  G(t)  *>  Pr^{8  t). 

Suppose  a  second  order  correct  confidence  interval  for  8  is  to  be 
constructed.  This  can  be  done  exactly  as  in  Section  4,  but  with  a  new  formula 
for  a  *  Following  the  reasoning  of  Section  4  in  this  case,  which  is 

similar  but  in  fact  much  easier,  one  gets  a  =  —  SKEW  {V},  where 

D 

,  n  T  t  * 

V - E  /  { z  -  E(6) }  (dN.  (s)  -  Y  (s)  9exp(Bz  Jds} 

&  i=l  0  1  1  1  1 

1  n  »  *  T  * 

=  —  E  (z  -  E(8) }  (N  (T)  -  0exp(8z  )  I  Y,  (s)ds).  (5.13) 

^  i=l  1  1  1  0  i 

T  (i)  T  (o) 

Here  E(8)  *  fQ S  (s,8)ds//QS  (s,8)ds.  a  may  be  computed  using  bootstrap 

replications  of  V,  or  using 


EV2  =  6  (F(2)(8)  -  F(1)(8)2/F(0)(8)  >, 

i  l  n  «  «  ,  T 

EV3  i  i  E(8)  }30exp(8z  )  /YJ(s)ds 

^  n  i=l  1  1  0  1 

n  T 

-  3  -  E  (z  -  E(8)}392exp(26zJ  /  tY.(t)dt), 

n  i*l  1  *  0  1 

where  F^(8)  =  /  S^^(s,B)ds. 

Similarly  other  first  order  asymptotic  statements  in  other  parametric 
survival  data  models  may  be  "corrected"  to  second  order  using  the  bootstrap 
approach  and  results  from  this  paper. 
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(4b)  As  a  second  example  of  the  use  of  the  developed  methods,  consider 

the  "Kaplan-Meier  problem"  of  obtaining  nonparametric  estimates  of  the  unknown 

t 

cumulative  distribution  F  and  the  unknown  cumulative  hazard  A(t)  *  /  a(s)ds 
based  on  a  partially  censored  sample  of  n  observations  from  F.  Let  again  X° 
be  the  uncensored  observation  for  individual  no.  i  and  let  c^  be  the  censoring 
time,  so  that  *  min{X°,  c^}  and  6^^  =  l{X°  <_  c^}  are  observed. 

t  _  _ 

The  natural  estimator  for  A  is  Nelson  and  Aalen's  A(t)  =  /QdN(s) /Y(s) , 

where  N^s)  =  l{X°  _<  s,  6^  =  1},  Y^(s)  ■  1{X°  _>  s,  c^  _>  s},  and  N  =  E^jN^, 

_  n  „  _  _ 

Y  =  T^e  Kaplan-Meier  estimator  for  F  is  F(t)  *  1  -  11^  t ^  {1  -  dN(s)/Y(s)}, 

and  lies  uniformly  within  0  (n  of  1  -  exp{-A(t)},  see  for  example  Hjort 

P 

(1984,  Section  3). 

If  M.(t)  =  N  (t)  -  /V(s)ot(s)ds  and  M  -  E*M,,  and  A(t)  -  /*J(s)a(s)ds 

1  1  U  1  1=1  1  0 

where  J(s)  ■  I{Y(s)  >  0},  then 

_  t  _  ___  __ 

Z  (t)  =  t/n  (A(t)  -  A(t)  }  *  y/ri  f  J(s)dM(s) /Y(s) 
n  0 


n  t 


—  E  /  (nJ(s)/Y(s)}  dM. (s) 
^  i-1  0  1 


(5.14) 


Let  us  assume  the  random  censorship  model,  where  the  c^'s  are  drawn  independently 
from  a  distribution  R,  and  independently  of  the  X°*s.  Then  EY^(s)  =  F[s,®)R[s,tx>) 

=  exp{-A(s) }R[s ,<*>)  =  y(s),  and  Y(s)/n  converges  to  y(s)  uniformly  on  [0,  T]  in 
probability,  also  nJ(s)/Y(s)  converges  to  l/y(s)  uniformly  on  [0,  T]  in  probabi¬ 
lity  provided  y(s)  is  bounded  away  from  zero  in  this  interval  (which  amounts 
to  saying  y(T)  >  0). 

is  a  zero  mean  martingale,  and 

i  n  t  _ 

EZ  (t)2  =  E  —  I  /  {nJ(s) /Y(s))2Y  (s)ct(s)ds 

n  n  i-i  0  1 


E  /  (nJ(s)/Y(s) }a(s)ds 
0 


/(l/y(s) }a(s)ds  =  a(t)2.  (5.15) 

0 
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It  is  a  well  known  fact  by  now  that  Z^(t)  -*•  Z(t),  a  Gaussian  martingale  with 
Var  Z(t)  ■  o(t)2.  This  result  combined  with  a  natural  estimator  of  o( t)  is 
widely  used  to  get  pointwise  or  simultaneous  confidence  bands  for  A(t)  and 
a  fortiori  (via  F(t)  *  1  -  exp{-A(t)})  for  F(t),  see  for  example  Andersen 
and  Borgan  (1985)  and  Hjort's  (1985a)  discussion  contribution. 

The  methods  and  results  of  the  present  paper  make  it  possible  to  obtain 
the  asymptotic  skewness  of  the  Nelson-Aalen  estimator.  The  formulae  in  A. Ill 
can  be  shown  to  imply 

SKEW{A(t) }  *  SKEW{Z  (t)} 
n 

■  —  [  /  {l/y(s)2}a(s)ds  +  a(t)'+]  /  a(t)3  +  0(n-1).  (5.16) 

0  2 

A 

The  slightly  alarming  consequence  is  that  the  distribution  of  A(t)  always  is 
skewed  to  the  right  (a  lower  bound  for  the  skewness  is  (A(t)  1  +  -j)a(t)/*^i 
+  0(n  1) ,  using  the  Cauchy-Schwarz  inequality).  The  traditional  first  order 
asymptotic  statements  effectively  ignore  this  positive  skewness.  There  are 
ways  of  "repairing  for  skewness"  now  that  it  has  been  detected.  These  matters 
will  perhaps  be  returned  to  at  a  later  occasion. 
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APPENDIX 


A. I .  Prel lminar les . 

Some  of  the  more  involved  technical  arguments  that  were  needed  several  places 

in  Sections  2-4  are  tended  to  here.  We  shall  not  hesitate  to  make  convenient 

assumptions  as  long  as  they  are  statistically  reasonable.  Thus  we  postulate 

at  once  that  the  covariates  are  uniformly  bounded,  i.e. 

| £  M  for  every  z^  (A.l) 

for  some  large  enough  M.  Following  Andersen  and  Gill  (1982)  we  assume  through- 

(k)  in  k 

out  the  Appendix  that  S  (s,8)  =  —  (s)exp(BZj)  converges  to  s  me 

(k) 

s  (s,B)  uniformly  for  s  in  [0.  T]  and  for  8  in  a  neighbourhood  of  each 


reference  point  8q,  in  probability,  k  =  0,  1,  2,  3.  This  is  indeed  a  conse¬ 
quence  of  the  reasonable  assumption  that  the  empirical  distribution  of  the 
first  n  z^'s  converges  to  some  distribution  on  [-M,  M]  and  that  the  empirical 
distribution  of  the  first  n  c^'s  converges  to  some  distribution  R  on  [0,  °°] 

with  the  property  that  R[T,  <*>]  >  0.  (No  censoring  at  all  corresponds  to  R{“>) 

(k) 

=  1.)  The  limit  functions  s  (s,8)  are  continuous  in  8,  so  that  supt<T 
|S^  \s,B)  -  s^k^(s,8o)|  -*■  0  too.  Finally  s^^(s,B)  is  bounded  away  from  zero 
for  s  _<  T  and  for  8  in  a  neighbourhood  of  8q- 

Since  we  aim  at  second  order  asymptotic  results  we  must  ask  for  even  more 


than  mentioned  above.  Define 

cn(t)  =  n*5  (S(k)(t,8)  -  s(k)(t,S)) 

-  n2  (S(k)(t,8)  -  ES(k)(t,8)J  +  n*5  {J  Z  (z  )ky  (t)exp(Bz.)  -  s(k)(t, 

j-1  J  3  3 


Cn,l(t)  +  Cn,2(t)’ 


(A.  2) 


writing  y  (t)  =  EY  (t)  =  Pr(X.  _>  t,  c.  _>  t}.  Assume  for  the  moment  that  the 
J  J  J  J 

Cj’s  are  drawn  independently  from  the  distribution  R;  the  case  with  known 
censoring  times  can  be  handled  similarly  but  with  slightly  heavier  notation. 


A  fair  amount  of  details  lead  to 


1  n  2k  1  n  2k 

±—  Z  (z  )  exp(28z,)<S  (s,t)  -  Z  (z.)  exp(26z  )S.(t,u) 

n  j«l  J  J  J  n  j  =  1  J  J  J 

1  n  2k  2 

+  2  {—  Z  (z  )  exp(28z  )  <$  ,(s,t)6  (t,u)}  ,  s  _<  t  <_  u, 

n  j»l  J  J  J  J 


where  6^(s,t)  -  y^(s)  -  (t)  =  F  [s,«)R[s,«)  -  F^ [t ,«)R[t,«)  <_  F^  [s,t)  +  R[s,t) 

exp(8QZ^ ) {A(t)  -  A(s)}  +  R[s,t).  Hence  the  above  expectation  is 

<  3  M2kexp(2 1 8 1 M)  [exp( | 6 |M) {A(t)-A(s) }  +  R[s,t)] 

M2kexp(2|8|M)  [exp( | 8 Jm) (A(u) -A(t) }  +  R[t,u)]. 

This  can  be  used  to  prove  that  {C  . }  is  tight,  following  the  arguments  in  the 

n  ,1 

proof  of  Theorem  15.6  in  Billingsley  (1968).  Similarly  (C  }  is  tight.  All  of 

n>2 

the  arguments  used  can  be  made  uniform  over  bounded  sets  of  values  of  8.  The 
conclusion  we  need  from  all  this  is 


sup6cS  s“P0<t<T  |s(k)  (t  ,8)  -  s(k)(t,8)|  -  0  (rfV  (A. 3) 

In  fact  cn(*)  will  converge  to  a  zero  mean  Gaussian  process  in  D[0,  T] .  A 
final  consequence  of  the  efforts  above  is 

sup0<J.<T  |s(k)(t,8)  -  s(k)(t,8o)|  -  Op(n"V  (A.4) 


A. II.  The  convergence  of  /Hn(s)dA(s) . 

The  lemma  stated  as  (3.9)  was  needed  several  times  in  the  course  of  Section  3 
and  is  also  needed,  along  with  an  error  estimate,  in  connection  with  the 
problem  of  estimating  the  accerelation  factor  a  =  with  sufficient  pre¬ 

cision.  The  proof  below  is  some  steps  longer  than  actually  needed  to  show 
convergence  in  probability  since  we  in  some  cases  shall  need  to  know  that  the 
convergence  rate  is  n  . 


Lemma  A. 1.  Let  H  (.)  be  a  predictable  and  bounded  stochastic  process, 
n 

and  assume  that  H  converges  to  some  h  uniformly  in  probability.  Then 
n 

/nH  (s)dA(s)  S.  /0h(s)dA(s).  Furthermore,  if  sup.  |h  (s)  -  h(s)  |  «  0  (n  ”*) , 

n  LKg  <  l  n  P 

then  supQ<t<T|/0  Hn(s)dA(s)  -  /q  h(s)dA(s) |  ■  0p(n  2) . 

Proof :  Writing  N  *  =  ^i=lMi’  we  deduce  from  dN^(s)  =  dM^(s)  + 

Y.(s)exp(8  z.)dA(s)  that  dN(s)/n  =  dM(s)/n  +  S^(s,8  )dA(s).  Hence  by  a  Taylor 
i  o  I  o 

expansion  argument 


T  ^  T  dN(s)/n 

/  H  (s)dA(s)  -  /  H  (s) 


0  n 


»  ”  S(0)(S,B) 


=  /  H  (s){- 


S(1)(s,6  ) 


O'  „  x  ,  i,92  1  ,a  a  \1\  dM(s) 


0  n 
T 


Sw'(s,6  )  S""(s,6  ) 

o  o 


(0),_  0  x  c(0)/-  x"7  (e  ■  Bo}  +  V„«»,  ^  -  o'  n 


—  (6  -  B  )2} 


S(1)(s,Bn)  „  g  2 

+  f  H  (s)  {1  -  (g  .  e  )  +  'tt-frX 

°  n  S(0)(s,B)  °  36  S(0)(s,B) 


S""(s,e) 

— —  (3  -  B  )2S(0)(s,B  )}  dA( s) 


(i)  +  (ii)  +  (Hi)  +  (iv)  +  (v)  +  (vi)  , 


say,  with  8  somewhere  between  Bq  and  8. 

(i)  is  the  martingale  L(t)  *  n  ^  fn  H  (s)S^(s,8  )  ^dM(s)  evaluated  at  T 

n  o 

and  divided  by  /n.  The  martingale  has  variance  process 

(L,L) (t)  *  -  I  /  H  (s)2S(0)(s,8  )~2  Y  (s) exp(8  z.)dA(s) 
n  i*l  0  n  o  i  o  i 

-  /  H  (s)2S(0)(s,B  )_1  dA(s) . 

0  n  o 


Lenglart's  inequality,  see  for  example  Andersen  and  Borgan  (1985,  Section  3), 
implies 

Pr{sup0<t<T  | L ( t )  |  >  n)  1  6/n2  +  Pr{(L,L) (T)  >  6} 


for  all  positive  6  and  n.  Choosing  6  big  enough  to  make  the  second  term  less  than 


a  prescribed  c  and  afterwards  an  even  bigger  n  to  get  d/n2  less  than  t  too,  we 

see  that  sup0<t<T  |i(t)|,  and  in  particular  L(T) ,  is  bounded  in  probability. 

Consequently  (i)  -  0  (n  ). 

P 

A  •.I-'  1 

(ii)  is  similarly  (0  -  6  )  times  0  (n~ J) ,  i.e.  (ii)  *  0  (n  ) .  (iii) 

T  _  °  p  P 

is  (0  -  8q)2  times  /q  Kn(s)dM(s)/n,  say,  where  K  is  bounded  in  probability. 

_  1  n  T 

But  Kn<s)dM(s)/n!  j:i=1  /0  |Kn(s)  |{dNi(s)  +  Yi(s)exp(0ozi)dA(s)}  < 

U  +  A(T)  i  Z1=1exp(0ozi)>  supQ<s<T  jK^fs) | -  Hence  (iii)  =  0p(n_1). 

T 

(iv)  is  of  course  within  |/Q  {Hn(s)  -  h(s)}dA(s)|  <_  A(T)  supQ<s<T  |Hn(s) 
-  h(s)  |  of  the  limit  / .  h(s)dA(s)  ,  and  is  accordingly  0  (n  ”*)  under  the  extra 

p 

assumption  stated  in  the  lemma,  (v)  *  0  (n  )  for  reasons  similar  to  those 

P 

explained  in  connection  with  (iii).  Finally  (vi)  -  0  (n_1) .  □ 

P 

T 

There  are  also  occasions  where  fn  H  (s)dA(s)  needs  to  be  studied  for 

u  n 

processes  that  are  not  predictable.  The  most  important  of  these  cases  are 
of  the  type  Hn(s)  *  H^Cs.S),  say,  where  Hn(s,3o)  is  predictable  and  converges 
uniformly  in  probability  to  some  h(s,0o).  (H^Cs.B)  is  not  predictable  since 
it  depends  upon  B  which  is  not  even  measurable  w.r.t.  the  history  up  to  time  s.) 
Then  another  Taylor  expansion  saves  the  day: 

T  ,  „  T 

f  H  (s.B)dA(s)  =  /  {H (s,8  )  +  ■—  H  (s,6  )  (0  -  0  )  + 

0“  0  n  0  *1  O  o 

+  kW  Hn(s’6)  "  e0>2>  dA(s), 

and  these  terms  can  be  shown  to  be  h(s,B  )dA(s)  +  0  (n~S  +  0  (n“S  +  0  (n_1) 

u  o  p  P  p 

under  reasonable  restrictions. 


A. III.  The  skewness  of  a  martingale. 


To  arrive  at  suitable  expressions  for  the  acceleration  factor  a  *  a  /^n  we 

o 

needed  the  skewness  of  the  random  variable 
1  n  T 

U  “  ' —  £  /  {z  -  E(s,0  )}  {dN  (s)  -  Y  (s)exp(B  z  )a(s)ds}.  (A. 5) 

4i.l0  1  0  1  1  01 

The  formulae  derived  below  are  also  useful  in  other  statistical  problems. 

They  can  be  used  to  obtain  estimates  of  the  skewness  of  parameter  estimators 
in  models  with  censoring,  pointing  to  the  possibility  of  correcting  first 
order  asymptotic  statements  in  such  situations,  and  can  also  be  used  to 
obtain  the  skewness  of  nonparametric  estimators  like  the  Nelson-Aalen  and 
the  Kaplan-Meier  estimators. 

Let  in  general  terms  N^,...,Nn  be  0-1  counting  processes  observed  over  the 
time  interval  [0,  T]  with  intensities  of  the  form  Y^(s)dA^(s) , . . . ,Y  (s)dAn(s) ,  i.e. 

t 

M±(t)  =  N±(t)  -  /  Y1(s)dAi(s),  t  >  0,  i  -  1 . n  (A.6) 

become  martingales  w.r.t.  the  o-algebras  *  o{N^(s),  Y^s);  s  ^  t}.  We  take  the 
at-risk  indicators  Y^  to  be  non-increasing  left  continuous  0-1  processes,  and 


Finally  let  dA^(s)  =  a^(s)ds,  i.e.  A^  is  absolutely  continuous,  i  =  l,...,n. 

Lemma  A. 2.  Let  h,,...,h  be  a.s.  bounded  and  predictable  processes.  (It 
suffices  for  h^  to  be  a  caglad  function,  i.e.  the  paths  are  left  continuous 
with  right  hand  limits,  and  to  be  progressively  measurable  w.r.t.  the  Ft's. 

The  interpretation  is  that  h^s)  is  known  already  at  time  s-e  for  small  enough  e.) 
Consider 

T 

Z,  =  f  h,(s)dM.(s),  i  =  1, — ,n.  (A. 7) 

l  0  1  1 

Then  Z^,...,Zn  are  orthogonal  and  the  following  formulae  are  true: 


(A. 8) 


n  T 


E(  l  Z.)2  -EE  /  h  (s) 2Y  (s)dA  (s) , 

i  0  i  i  i 


i-1  *  i-1  0 


T  t 


E(Z,3)  =  E  /  h  (s) 3Y  (s)dA  (s)  -  3E/  /  h,(s)dA,(s)  h.  (t) 2Y  (t)dA. (t) 
i  niii  0  0  1  *  1  1  1 


T  t 


E(Z,Z  2)  -  E  /  /  h  (s)dM  (s)  h. (t)2Y, (t)dA, (t) ,  i  4  j, 
1  J  OO1  1  J  J  J 


(A.  9) 
(A. 10) 


n  T 


E(  E  Z.)3  =  E  E  /  h.(s) 3Y.(s)dA.(s) 
i-1  1  i-1  0  1  1  1 


T  t  n 


+  3  E  /  /  E  h. (s)dM  (s)  E  h. (t) 2Y, (t)dA,  (t)  .  (A. 11) 

0  0  1=1  i  1  j=l  3  ^  i 

Proof :  may  be  a  function  of  and  Y  values  for  j  4  i,  as  is  the 

case  for  U  of  (A. 5).  Z^  and  Z^  may  accordingly  be  dependent,  but  they  are 
nevertheless  orthogonal  martingales  (here  evaluated  at  the  endpoint  T) .  This 
and  formula  (A. 8)  are  standard  facts,  see  for  example  Andersen  and  Borgan 
(1985)  or  Gill  (1980). 

Think  of  as  a  Lebesgue-Stielt jes  integral, 

Z,  '  Z, '  -  E  h  (s  )dM  (s  ), 
l  i  i  u  i  u 


where  dM  (s  )  -  dN  (s  )  -  Y  (s  )dA  (s  )  =  N  [s  ,s  )  -  Y  (s  )A  [s  ,s  ) 
iu  iu  iuiu  iu  u+1  i  u  i  u  u+1 

*  M, (s  ,,)  -  M(s  ),  for  a  fine  grid  0=s  <  ...  <  s  =  T.  Given  the  history 

i  u+1  i  u  o  m 

of  everything  happened  in  [0,  s) ,  i.e.  ,  Y^(s)  is  known  and  N^[s,s+ds)  is 
binomial  (Y^(s),  A^[s,s+ds)}.  This  can  be  used  to  evaluate  the  expectation  of 

(Z,’)3  =  E  h  (s  ) 3dM. (s  )3  +  3  E  h,(s  )dMJ(s  )  h.(s  )2dM.(s  )2 
l  iuiu  iuiuiviv 


u<v 


+3  E  h,(s  )2dM, (s  )2  h. (s  )dM, (s  ) 

^  i  u  iu  iviv 
u<v 

+6  E  h  (s  )dM  (s  )h.(s  )dM  (s  )h  (s  )dM  (s  ) 
iu  iuiv  iviw  iw 
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Here  E(h  (s  )3dM  (s  )3|F  }  -  h  (s  )3Y  (s  )  [{1-dA  (s  )}3dA  (s  )  -  dA.(s  )3 

iu  iu  u-  iuiu  iu  iu  iu 

{1-dA  (s  )}]  -  h  (s  )3Y  (s  )dA  (s  ), 
iu  iu  iu  iu 

E(h.(s  )dM.(s  )h  (s  )2dM((s  )2|f  } 

iu  luiv  iv'v- 

=  hi(su)dMi(su)h1(sv)2Y.(sv)  {{l-dAi(sv)}2dAi(sv)  +  dA^s^l-dA^)  }] 

=  hi(su)dMi(su)hi(sv)2Yi(sv)dAi(sv) 

=  -  hi(su)dAi(su)hi(sv)2Yi(sv)dAi(sv), 

and  similarly  E{h(s  )2dM,(s  )2h  (s  )dMJ(s  )  |F  }  =  0,  E{h.(s  )dM,(s  )h,(s  ) 
iu  iuiviv'v-  iuiuiv 

dMi(sv)hi(sw)dMi(sw) !Fw-}  =  °-  Hence 

E(Z  ' ) 3  =  E  I  h  ( s  ) 3Y  (s  )dA  (s  )  -  3  E  Z  h  (s  )dA(s  )h,(s  )2Y,(s  )dA  (s  ) 
i  iuiuiu  ,  iuiuiviviv 

u  u<v 

Formula  (A. 9)  follows  by  appropriate  limiting  arguments.  Rigor  can  be  achieved 

by  first  proving  the  formula  for  h^  a  predictable  step  function  and  next  for 

a  general  a.s.  bounded  predictable  process,  compare  the  abstract  way  in  which 

such  processes  are  defined  in  Meyer  (1971). 

Next  look  at 
n  n 

(1  Z.)3  =  Z  Z.3  +  3  I  Z  Z  2  +  6  Z  Z.ZZ. 


i=l  1  i=l 


i<j  <k 


i  j  k 


Using  arguments  similar  to  those  above  formula  (A. 10)  can  be  proved,  whereas 

E(Z,Z.Z  )  =  0  when  the  indices  are  distinct.  It  follows  that 
i  j  k 


n  T  t 


E(  Z  Z  )3  =  E 
i=l  1 


{Z  /  h.(s)3Y.(s)dA.(s)  -  3  Z  //  h.(s)dA.(s)h.(t)2Y,(t)dA.(t) 
i=1  0  1  i  i  i=100i  i  i  i  i 


+  31  /  /  h.(s)dM.(s)  h.(t)2Y.(t)dA.(t)}, 

itj  0  0  i  1  J  J  J 

from  which  the  final  formula  (A. 11)  follows  upon  using  dM^(s)Y^(t) 
-  dA . (s) Y . (t)  .  □ 


Now  let  us  apply  these  formulae  to  the  variable  U  of  (A. 5)  and  (4.33). 


Firstly,  EU2  »  ER^ ,  where 


n  T 


*  —  £  /  (z,  -  E(s,B  ) }2  Y  (s)exp(B  z  )dA(s) 

2  n  ^  g  i  o  i  oi 


/  (S(2)(s,B  )  -  S(1)(s,B  )2/S(1)(s,B  ) }dA(s) . 
go  o  o 


Clearly  R^  converges  in  probability  to  an  underlying  population  parameter  p ^ 


which  is  exactly  Z  =  a2,  compare  (2.6).  The  natural  estimator  obtained  by 


A  A  A  A. 

inserting  B  and  A  for  Bq  and  A  where  necessary  is  just  Z  =  a2  of  (2.9).  The 


-!< 

assumptions  and  results  of  A. I  imply  ER.  *  a2  +  0(n  ) ,  R„  =  o2  *  a2  +  0  (n  2)  , 

i  2  p 


and  hence  o2  =■  ER^  +  0^(n  2) ,  i.e.  a2  is  a  good  enough  estimate  of  EU2. 


Secondly,  EU3  =  ER^/v'xT,  where 


n  T 


R,  *  —  £  f  (z  -  E(s,B  )}3  Y  (s)exp(B  z  )dA(s) 
J  n  M_.  o  i  °  1  o  i 


i=l 


T  t  n 


1  U  «  44 

+  3  —  /  /  £  { z4  -  E(s,Bq) >dMi(s)  Z  {zA  -  E(t,g) }2Y^ (t)exp(B^z^ )dA(t) 


n  0  0  i 


j-1 


o  3 


=  R3'  +  R3"  . 


o  j‘ 
(A. 12) 


The  following  considerations  will  show  that  there  are  certain  population 
3\  P3‘ 


parameters  p  ',  p,"  such  that 


lc 

ER3'  =  P3'  +  0(n  2)  , 
so  that 


ER. 


It  _  .  tt 


P,"  +  0(n 


(A. 13) 


EU3  =  —  (p_ ’  +  p,"  )  +  0(n~1) , 
y-3  3 

/n 


(A. 14) 


P3'  and  P3"  can  be  expressed  in  terms  of  the  limit  functions  s^(s,B),  k 
0,  1,  2,  3;  in  fact  and  for  the  record 


T  T 

p  '  -  /  g' (s,g  )dA(s) ,  p  "  -  -3  /  g"(s,8  )dA(s) , 


(A. 15) 


V.V  vw  V  V.V.  VV.".-''  .  -  . 


1  n 

g'(s,8  )  -  p-limit  of  —  Z  {z.  -  E(s,B  )}3  Y  (s)exp(8  z  ),  (A. 16) 

j  o  n  ,  .  l  ,o  l  oi 


-  -  “1-1 

n  f 

g  "(t,8  )  *  p-limit  of  —  I  /  {z  -E(s,B  )}exp(8  z,)dA(s){z  -E(t,B  )}2exp(B  z  ). 
jo  n.,ni  o  ol  l  o  oi 

i-1  u 

(A. 17) 

The  specific  form  of  these  population  parameters  need  not  concern  us  here, 

U 

what  is  important  is  to  get  n  -consistent  estimates  of  them. 

p^',  clearly  the  limit  in  probability  of  R^' ,  is  easy  to  handle.  The 
natural  estimator  is 


A  -  U  i 

R,'  *  —  l  f  {z.  -  E(s,8))3  Y  (s)exp(8z.)dA(s). 
j  n  o  1  1  1 


(A. 18) 


The  results  of  A. I  and  A. II  imply  ER^’  “  P^'  +  0(n  2) ,  R^'  =  P3*  +  °p^n  2) » 

so  R  '  =  ER  '  +  0  (n  ^)  is  good  enough  for  ER,'.  Next  consider  ER  "  and  p  ". 

J  o  p  J  J  J 

Write 

n  t 

U  (t)  -  —  I  /  {z  -  E(s ,8  )}  dM.(s), 
n  ^  i=l  0  1  °  1 

B  ( t )  =  Vn  [—  Z  {z  -E(t,6 J  }2Y  (t)exp(g  z.)  -  {s(2)(t,6  )-s(1)(t,8  )2/s(0)(t,B  )}] 
n  n  j=1  j  o  j  o  j  o 

so  that 


ER."  =  3  E  /  U  (t)B  (t)  dA(t) . 
3  0  n  n 


(A. 19) 


Here  {U  (t) ,  B  (t) }  will  in  fact  converge  jointly  to  some  Gaussian  (U(t) ,  B(t)}, 
n  n  “ 

T  T 

/.  U  (t)B  (t)  dA(t)  will  converge  in  distribution  to  /.  U(t)B(t)  dA(t) ,  and 
u  n  n  u 


ER,"  -►  p,"  =  3  /  EU(t)B(t)  dA(t) 
3  3  0 


(A. 20) 


gives  another  interpretation  of  the  population  parameter  p^". 

We  shall  have  to  be  somewhat  circumventive  now,  due  to  the  fact  that 
although  EdM^sjY^Ct)  =  0  for  i  +  j,  EdM^Cs)  {z^  -  E(t ,  8q)  )2Y^  (t)  may  still  be 
different  from  zero.  If  B°(t)  is  as  B  (t)  above,  but  with  the  deterministic 


limit  function  e(t,B  )  »  s  (t,B  )/s  (t,B  )  replacing  E(t,B  ),  then 

o  o  o  o 

B  (t)  -  B°(t)  -  -  Jn  {E(t,B  )  -  e(t ,0  )}2  S(0)(t,6  ), 
an  o  o  o 

and  supt<T  ~  B°(t) |  =  °p(n  ^  according  to  A. I.  Using  this  one  may  show 

that 

T  j, 

ER,"  »  3  E  /  U  (t)B°(t)  dA(t)  +  0(n~*), 

3  0  n  n 

compare  (A. 19),  and  this  latter  integral  is  easier  to  handle  since  Etz^  -  E(s,Bq)} 

dM^Cs)  {z^  -  e(t, Bq) }2Yj (t)  =  0  when  i  +  j.  One  ends  up  with 

1  n  ^  j, 

ER  "  -  3  E-  Z  /  /  (z  -E(s,3  )  }dM  (s)  {z  -E(t,  B  )  }2Y,  (t)exp(B  z.)dA(s)  +  0(n  2) 

3  n  00l  oil  oiroi 

and  since  dM^(s)Y^(t)  ■  -dA^(s)Y^(t)  the  claim  about  the  expression  Cor  p^" 

in  (A. 15),  (A. 17)  follows.  More  importantly,  it  also  follows  that 

-  n  T  t  ^  ^  ^  A 

R_"  *  -  3  —  Z  /  /  (z  -E(s,B) )exp(Bz .)dA(s) {z  -E(t,B) )2Y  (t)exp(Bz  )dA(t) 

3  n  0  0  1  1  1  1  i 

(A. 21) 

A  — L 

has  the  property  that  R^"  *  P^"  +  0p(n  )  =  ER^"  +  °p(n  '  •  (The  integration 
takes  place  on  0  <  s  <  t  <  T,  as  opposed  to  0^s_<t<^T.) 

We  may  conclude  that 


EU3  =  —  (R  ’  +  R ,")  +  0  (n_1) 
/n  3  3  P 


Ca. 22) 


and  that 


R  ’  +  R  " 

SKEW  {U}  =  —  — - —  +  0  (n  ). 


(A. 23) 
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